Python: How to select certain columns by slicing for replacing NaN values after groupby?

Question

Assuming we have a df as follows

df = pd.DataFrame({ 'Col1' : [1, 1, 1, 2, 2, 2, 2],
                   'Col2' : [5, 6, 8, 3, 7, 8, 5],
                  'Col3' : [2, None, None, 3, None, None, 4],
                  'Col4' : [3, None,5, None, 8, None, 66],
                  'Col5': [None, 8, 6, None, 9, 6,None],
                  'Col6' : [3,5,2,5,2,7,9]})

I wanted to replace the None values in the columns Col3, Col4 and Col5 using the solution suggested by jjs in this post here after applying groupby on the first column Col1.

The way I did is

df = df.groupby('Col1')['Col3','Col4','Col5'].ffill().bfill()

but it is a lot of work for mentioning the columns manually.

So, I wanted to know how can I choose the columns Col3, Col4 and Col5 by slicing?

Thanks

Something like using `.iloc[2:5]` to get the desired columns — some_programmer, Mar 09 '19 at 16:18

score 2 · Answer 1 · answered Mar 09 '19 at 16:17

2

this solution fills all NaN-columns in the way you want:

df.groupby('Col1')[df.columns[df.isnull().any()]].ffill().bfill()

answered Mar 09 '19 at 16:17

Christian Sloper

7,440
3
15
28

If you are to `ffill` and `bbill` all columns with `NaN`, why not just fill all columns? `df.groupby('Col1').ffill().bfill()` – rafaelc Mar 09 '19 at 16:21
Hey, @Christian. I tried doing that, and it's filling the NaN-columns alright, but it is dropping the columns that have no NaN values like Col2 and Col6 – some_programmer Mar 09 '19 at 16:21
Hey @KashyapMaheshwari . Ah ok. misunderstood your desired output. – Christian Sloper Mar 09 '19 at 16:23
@RafaelC i interpreted the desired output to just be the NaN-columns – Christian Sloper Mar 09 '19 at 16:23
1

@RafaelC of course your solution is more desirable if all columns are wanted. – Christian Sloper Mar 09 '19 at 16:25

score 1 · Accepted Answer · answered Mar 09 '19 at 16:23

1

Tbh, I'm not sure I understand your question.

As far as I see, you can just do straightforward

df.groupby('Col1').ffill().bfill()

because ffill() and bfill() just won't change your columns with no NaNs.

Now, if you know beforehand which columns you need to backfill/ffill and want to reduce verbosity, you may just save them in a cols variable

cols = ['Col3','Col4','Col5']
df[cols] = df.groupby('Col1')[cols].ffill().bfill()

answered Mar 09 '19 at 16:23

rafaelc

57,686
15
58
82

Thanks. I guess I got confused with its implementation. The method suggested by you worked. – some_programmer Mar 09 '19 at 17:03

Python: How to select certain columns by slicing for replacing NaN values after groupby?

2 Answers2