0

Not sure if this question has been asked. But I want to replace NAN values in a data frame by merging it with another. The data frame contains NAN values in certain columns. I grouped these columns by values in an id column. So in other words, the sum of all values in col1, col2, col3 by an id number.

df_group1 = df.groupby('id')[['col1']].sum()
df_group2 = df.groupby('id')[['col2']].sum()
df_group3 = df.groupby('id')[['col3']].sum()

Then I merged those three dataframe together into one.

df_group = pd.concat([df_group1, df_group2, df_group3], axis = 1)

Subsequently, I divided those values by the length of rows containing the id_number

for i in df['id'].unique():
    df_group = df_group/len(df[df['id'] == i])

Now I want to merge this dataframe with df in order to replace the NAN values in df with those in df_group so if a row in df_group has id number 1111 and the respective col1 value is 200. I want to replace all NAN values in df for all rows with id 1111 with 200. What is the best way to do this?

EDIT: Say I have this data frame df_group(Image1), I want to replace all NANs in df(Image2) with those values in df_group based on id and column name

SampleDF

SampleDF2

  • So, I assume you have a df with Id and few columns with NaN values. and another df with Id and a column with non NaN values. You want to merge these 2 data frames and replace the NaN's in the first df with the values in the 2nd df. It would be better if you can post sample data set and the expected result. Otherwise we are only guessing. – davidbilla Jun 22 '20 at 13:44
  • @davidbilla I've edited the question with images of sample data frames – Vadim Katsemba Jun 22 '20 at 14:14
  • @VadimKatsemba please don't use images... have a look at https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples for how to make it easier for people to help you. – Jon Clements Jun 22 '20 at 14:22
  • @JonClements my bad, I wasn't sure how to display a data frame properly so I used a snipped screenshot – Vadim Katsemba Jun 22 '20 at 14:31

1 Answers1

0

Does this help? Basically for each column in col1, col2, col3, using np.where, I am trying to check if it is null in df and if it is null then get the matching column value given the id using mapfunction

for col in df.columns[1:]:
    df[col] = np.where(df[col].isnull(), df['id'].map(df_group[col]), df[col])

print(df)

Result:

   id   col1   col2   col3
0   4  150.0  501.0  287.0
1   2  318.0  177.0  138.0
2   1   96.0  301.0   90.0
3   1  233.0   81.0  400.0
4   3  194.0  334.0  402.0
davidbilla
  • 2,120
  • 1
  • 15
  • 26