15

I have a dataframe similar to below

id A   B   C   D E
1  2   3   4   5 5
1  NaN 4   NaN 6 7
2  3   4   5   6 6
2  NaN NaN 5   4 1

I want to do a null value imputation for columns A, B, C in a forward filling but for each group. That means, I want the forward filling be applied on each id. How can I do that?

DYZ
  • 55,249
  • 10
  • 64
  • 93
HHH
  • 6,085
  • 20
  • 92
  • 164

1 Answers1

24

Use GroupBy.ffill for forward filling per groups for all columns, but if first values per groups are NaNs there is no replace, so is possible use fillna and last casting to integers:

print (df)
   id    A    B    C  D    E
0   1  2.0  3.0  4.0  5  NaN
1   1  NaN  4.0  NaN  6  NaN
2   2  3.0  4.0  5.0  6  6.0
3   2  NaN  NaN  5.0  4  1.0

cols = ['A','B','C']
df[['id'] + cols] = df.groupby('id')[cols].ffill().fillna(0).astype(int)
print (df)
   id  A  B  C  D    E
0   1  2  3  4  5  NaN
1   1  2  4  4  6  NaN
2   2  3  4  5  6  6.0
3   2  3  4  5  4  1.0

Detail:

print (df.groupby('id')[cols].ffill().fillna(0).astype(int))
   id  A  B  C
0   1  2  3  4
1   1  2  4  4
2   2  3  4  5
3   2  3  4  5

Or:

cols = ['A','B','C']
df.update(df.groupby('id')[cols].ffill().fillna(0))
print (df)
   id    A    B    C  D    E
0   1  2.0  3.0  4.0  5  NaN
1   1  2.0  4.0  4.0  6  NaN
2   2  3.0  4.0  5.0  6  6.0
3   2  3.0  4.0  5.0  4  1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I want to do it on some columns only like you last example, but when I use you're code I lose other columns, e.g. D to E – HHH Dec 09 '18 at 21:30
  • @H.Z. - It seems you forget for `df[['id'] + cols] =` and using `df = ` only. – jezrael Dec 09 '18 at 21:33
  • 1
    I have that piece as well...it only adds one extra column to the end! – HHH Dec 09 '18 at 21:36
  • 1
    @H.Z. - Interesting, for me it working nice. Added alternative solution with `update`, can you check it? – jezrael Dec 09 '18 at 21:38
  • 4
    Does the `fillna(0)` do anything in this example? I cannot see any values that were NaNs and are now zeros, in fact I cannot see any zeros at all. Is it OK to miss that out? – dumbledad Mar 26 '20 at 17:00
  • @dumbledad no, it does indeed nothing as there are no zeros for index 0 – Guido May 08 '20 at 08:26
  • @jezrael: When I do this: `cols = ['SleepDate', 'InBedTimeFinal'] s3[['ID'] + cols] = s3.groupby('ID')[cols].ffill().fillna(0).astype(datetime64[ns])`. There is an error: `NameError: name 'datetime64` 'is not defined. What should I do? – Jason Jun 05 '20 at 09:12
  • @Jason - Use [this](https://stackoverflow.com/questions/41514173/change-multiple-columns-in-pandas-dataframe-to-datetime) solution – jezrael Jun 05 '20 at 09:13
  • @Jason - Not understand, then why convert? – jezrael Jun 05 '20 at 09:18
  • 1
    @jezrael: if I use this: `s3[['ID'] + cols] = s3.groupby('ID')[cols].ffill().fillna(0)`. There is an error: `ValueError: Columns must be same length as key`. – Jason Jun 05 '20 at 09:20
  • @Jason - hmmm, not idea, is possible see some data for test? – jezrael Jun 05 '20 at 09:35
  • @Jason had similar issue, I only had one column to fill so simply went with df['id'] = df.groupby('id')['my col'].ffill().fillna(0).astype(int) – A Rob4 Apr 26 '22 at 13:46
  • @jezrael your solution doesn't work anymore for pandas versions > 0.25, since `groupby.ffill` no longer returns group values, so you get the same error as @Jason. See [this question](https://stackoverflow.com/questions/58181262/groupby-with-ffill-deletes-group-and-does-not-put-group-in-index) for a workaround. – Simone Aug 31 '22 at 11:34