1

I have the following dataframe and would like to fill in missing values.

mukey   hzdept_r    hzdepb_r    sandtotal_r silttotal_r
425897      0         61        
425897      61        152          5.3         44.7
425911      0         30           30.1        54.9
425911      30        74           17.7        49.8
425911      74        84        

I want each missing value to be the average of values corresponding to that mukey. In this case, e.g. the first row missing values will be the average of sandtotal_r and silttotal_r corresponding to mukey==425897. pandas fillna doesn't seem to do the trick. Any help?

user308827
  • 21,227
  • 87
  • 254
  • 417

2 Answers2

3

Using what I just learned a couple questions below....

FYI, this solution will still leave NaN's for any 'Mukey's that don't have any 'sandtotal_r's or 'silttotal_r's.

import pandas as pd

df = pd.read_clipboard()

df1 = df.set_index('mukey')

df1.fillna(df.groupby('mukey').mean(),inplace=True)

df1.reset_index()

    mukey  hzdept_r  hzdepb_r  sandtotal_r  silttotal_r
0  425897         0        61          5.3        44.70
1  425897        61       152          5.3        44.70
2  425911         0        30         30.1        54.90
3  425911        30        74         17.7        49.80
4  425911        74        84         23.9        52.35
Liam Foley
  • 7,432
  • 2
  • 26
  • 24
  • Thank you so much Liam, any chance you can have a look at why the code fails on a larger dataset: http://stackoverflow.com/questions/29359134/pandas-fillna-failing-on-dataset – user308827 Mar 31 '15 at 01:33
0

The pandas groupBy method will do what you want. This method allows you to group the elements of a DataFrame by their label and then perform some aggregation function on them.

df = pd.read_clipboard()

df1= df.set_index("mukey")

grouped_df = df.groupby("mukey").mean() 

df1.fillna(grouped_df,inplace=True)

Setting the index to "mukey" is essential. Without this step, the final DataFrame will be unchanged.

Also see Splitting on Object into Groups for more info on how to use groupBy.

Code on the Rocks
  • 11,488
  • 3
  • 53
  • 61