2

I am trying to fill in missing values in my dataframe. However I want to fill the missing columns with a groupby statement. So here is what my dataframe looks like...

Number    Other
1435       NaN
1435       NaN
1435       COOL
1817       NaN
1817       YES

So what I want to be able to do is basically just take the Max value or the last value that had data and fill the na for that specific number with that value..... So for example for 1435 I want to group it by number and then take the look for the max() in that column so it would find COOL and then fill all the NaN in the other column with COOL my final dataframe would look like this

Number    Other
1435       COOL
1435       COOL
1435       COOL
1817       YES
1817       YES

what I have tried so far.

df["Number"] = df["Number"].fillna(value=df.groupby(['Number'])["Other"].max())

as well as

df["Number"] = df["Number"].fillna(value=df.groupby(['Number'])["Other"].last())

I think what I need to do is possibly sort them and then use last to get the value, but I cant seem to figure out how to do this and return the results I am looking for. any help would be greatly appreciated thanks.

Angel
  • 35
  • 1
  • 7

3 Answers3

3

It really seems like all you want is groupby + bfill:

df['Other'] = df.groupby('Number').Other.bfill()
df

   Number Other
0    1435  COOL
1    1435  COOL
2    1435  COOL
3    1817   YES
4    1817   YES
cs95
  • 379,657
  • 97
  • 704
  • 746
1

I think it is the same problem as stated here:

Pandas Fill NA with Group Value If you want e.g. to use the most common value, you can do the following:

df = pd.DataFrame([[1435, np.nan],
                   [1435, 'YES'],
                   [1435, np.nan],
                   [1435, 'COOL'],
                   [1435, 'COOL'],
                   [1817, np.nan],
                   [1817, 'YES']],
                   columns=['Number', 'Other'])

df['Other'] = df.groupby('Number').transform(lambda x: x.fillna(x.value_counts().index[0]))

This will result in

    Number  Other
0   1435    COOL
1   1435    YES
2   1435    COOL
3   1435    COOL
4   1435    COOL
5   1817    YES
6   1817    YES

So combine the methods groupby, transform and fillna.

CodeZero
  • 1,649
  • 11
  • 18
0

for filling with last available value, try,

df["Number"]=df["Number"].bfill()
Pyd
  • 6,017
  • 18
  • 52
  • 109
  • That wont take into account if its the same Number, wouldnt that just take the last string or value in my dataframe? I want to be able to look at the numbers and groupby them and say if those Numbers are the same take the last value in that set or take the max value for that set and fill in the NaNs with the max for that specifc set of numbers. – Angel Nov 10 '17 at 16:30
  • @jezrael can you help – Pyd Nov 10 '17 at 16:31
  • If you need me to provide any other information I can, I am relatively new to the stackoverflow family. – Angel Nov 10 '17 at 17:52
  • Edit your question add pandas tag. It will be easy for other pandas expertises to see your problem – Pyd Nov 10 '17 at 18:01