Tengo varios cientos de marcos de datos con los mismos nombres de columna, como este:

Df1

        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28269  0.07365  22.16080  4050.311360   
1    4208.98    5  0.48122  0.08765  44.90035  4208.972962   
2    4374.94    9  0.71483  0.11429  86.96497  4374.927110   
3    4379.74    9  0.31404  0.09107  30.44271  4379.760601   
4    4398.01   14  0.50415  0.09845  52.83236  4398.007473 
5    5520.50    1  0.06148  0.12556   8.21685  5520.484742 

Df2

        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28616  0.07521  22.91064  4050.327388   
1    4208.98    6  0.48781  0.08573  44.51609  4208.990029   
2    4374.94    9  0.71548  0.11437  87.10152  4374.944513   
3    4379.74   10  0.31338  0.09098  30.34791  4379.778009   
4    4398.01   15  0.49950  0.08612  45.78707  4398.020367   
5    4502.21    9  0.56362  0.10114  60.67868  4502.223123   
6    4508.28    3  0.69554  0.11600  85.88428  4508.291777   
7    4512.99    2  0.20486  0.08891  19.38745  4512.999332
8    5520.50    1  0.06148  0.12556   8.21685  5520.484742

Así es como las estoy leyendo.

path_to_files = '/home/Desktop/computed_2d/'
lst = []

for filen in dir1:
   df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')

   lst.append(df)

El resultado deseado debería verse así:

      wave   num   stlines      fwhm        EWs  MeasredWave
0  4050.32   3.0  0.284425  0.074430  22.535720  4050.319374
1  4208.98   5.5  0.484515  0.086690  44.708220  4208.981496
2  4374.94   9.0  0.715155  0.114330  87.033245  4374.935812
3  4379.74   9.5  0.313710  0.091025  30.395310  4379.769305
4  4398.01  14.5  0.501825  0.092285  49.309715  4398.013920
5  4502.21    9   0.56362   0.10114   60.67868   4502.223123   
6  4508.28    3   0.69554   0.11600   85.88428   4508.291777   
7  4512.99    2   0.20486   0.08891   19.38745   4512.999332
8  5520.50   1.0  0.061480  0.125560  8.216850   5520.484742

Como puede ver, el número de filas no es el mismo. Ahora quiero tomar el promedio de todos los marcos de datos basados en la columna1 wave y quiero asegurarme de que cada índice de la columna wave de df1 se agregue al índice correcto de { {X3}}

-1
user10337478 16 sep. 2018 a las 15:42

3 respuestas

La mejor respuesta

Puede apilar todos los marcos de datos en uno usando pd.concat wich axis = 1 y tomar el promedio de la columna respectiva

df3 = pd.merge(df1,df2,on=['wave'],how ='outer',)
df4 = df3.rename(columns = lambda x: x.split('_')[0]).T
df4.groupby(df4.index).mean().T

Fuera:

    EWs         MeasredWave fwhm        num stlines      wave
0   22.535720   4050.319374 0.074430    3.0 0.284425    4050.32
1   44.708220   4208.981496 0.086690    5.5 0.484515    4208.98
2   87.033245   4374.935812 0.114330    9.0 0.715155    4374.94
3   30.395310   4379.769305 0.091025    9.5 0.313710    4379.74
4   49.309715   4398.013920 0.092285    14.5 0.501825   4398.01
5   8.216850    5520.484742 0.125560    1.0 0.061480    5520.50
6   60.678680   4502.223123 0.101140    9.0 0.563620    4502.21
7   85.884280   4508.291777 0.116000    3.0 0.695540    4508.28
8   19.387450   4512.999332 0.088910    2.0 0.204860    4512.9
1
Naga Kiran 16 sep. 2018 a las 15:21

La respuesta de @Naga Kiran es genial. Actualicé toda la solución aquí:

import pandas as pd

df1 = pd.DataFrame(
  {'wave'        : [4050.32, 4208.98, 4374.94, 4379.74, 4398.01, 5520.50],
   'num'         : [3, 5, 9, 9, 14, 1],
   'stlines'     : [0.28269, 0.48122, 0.71483, 0.31404, 0.50415, 0.06148],
   'fwhm'        : [0.07365, 0.08765, 0.11429, 0.09107, 0.09845, 0.12556],
   'EWs'         : [22.16080, 44.90035, 86.96497, 30.44271, 52.83236, 8.21685],
   'MeasredWave' : [4050.311360, 4208.972962, 4374.927110, 4379.760601, 4398.007473, 5520.484742]},
   index=[0, 1, 2, 3, 4, 5])

df2 = pd.DataFrame(
  {'wave'        : [4050.32, 4208.98, 4374.94, 4379.74, 4398.01, 4502.21, 4508.28, 4512.99, 5520.50],
   'num'         : [3, 6, 9, 10, 15, 9, 3, 2, 1],
   'stlines'     : [0.28616, 0.48781, 0.71548, 0.31338, 0.49950, 0.56362, 0.69554, 0.20486, 0.06148],
   'fwhm'        : [0.07521, 0.08573, 0.11437, 0.09098, 0.08612, 0.10114, 0.11600, 0.08891, 0.12556],
   'EWs'         : [22.91064, 44.51609, 87.10152, 30.34791, 45.78707, 60.67868, 85.88428, 19.38745, 8.21685],
   'MeasredWave' : [4050.327388, 4208.990029, 4374.944513, 4379.778009, 4398.020367, 4502.223123, 4508.291777, 4512.999332, 5520.484742]},
   index=[0, 1, 2, 3, 4, 5, 6, 7, 8])

df3 = pd.merge(df1, df2, on='wave', how='outer')
df4 = df3.rename(columns = lambda x: x.split('_')[0]).T
df5 = df4.groupby(df4.index).mean().T
df6 = df5[['wave', 'num', 'stlines', 'fwhm', 'EWs', 'MeasredWave']]
df7 = df6.sort_values('wave', ascending = True).reset_index(drop=True)
df7
1
yoonghm 16 sep. 2018 a las 17:06

Aquí hay un ejemplo para hacer lo que necesita:

import pandas as pd

df1 = pd.DataFrame({'A': [0, 1, 2, 3],
                    'B': [0, 1, 2, 3],
                    'C': [0, 1, 2, 3],
                    'D': [0, 1, 2, 3]},
                    index=[0, 1, 2, 3])

df2 = pd.DataFrame({'A': [4, 5, 6, 7],
                    'B': [4, 5, 6, 7],
                    'C': [4, 5, 6, 7],
                    'D': [4, 5, 6, 7]},
                    index=[0, 1, 2, 3])

df3 = pd.DataFrame({'A': [8, 9, 10, 11],
                    'B': [8, 9, 10, 11],
                    'C': [8, 9, 10, 11],
                    'D': [8, 9, 10, 11]},
                    index=[0, 1, 2, 3])

df4 =  pd.concat([df1, df2, df3])
df5 = pd.concat([df1, df2, df3], ignore_index=True)
print(df4)
print('\n\n')
print(df5)

print(f"Average of column A = {df4['A'].mean()}")

Usted tendrá

    A   B   C   D
0   0   0   0   0
1   1   1   1   1
2   2   2   2   2
3   3   3   3   3
0   4   4   4   4
1   5   5   5   5
2   6   6   6   6
3   7   7   7   7
0   8   8   8   8
1   9   9   9   9
2  10  10  10  10
3  11  11  11  11



     A   B   C   D
0    0   0   0   0
1    1   1   1   1
2    2   2   2   2
3    3   3   3   3
4    4   4   4   4
5    5   5   5   5
6    6   6   6   6
7    7   7   7   7
8    8   8   8   8
9    9   9   9   9
10  10  10  10  10
11  11  11  11  11

Average of column A = 5.5
1
yoonghm 16 sep. 2018 a las 13:06