Shift NaNs to the end of their respective rows

Question

I have a DataFrame like :

     0    1    2
0  0.0  1.0  2.0
1  NaN  1.0  2.0
2  NaN  NaN  2.0

What I want to get is

Out[116]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

This is my approach as of now.

df.apply(lambda x : (x[x.notnull()].values.tolist()+x[x.isnull()].values.tolist()),1)
Out[117]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

Is there any efficient way to achieve this ? apply Here is way to slow . Thank you for your assistant!:)

My real data size

df.shape
Out[117]: (54812040, 1522)

Aren't you shifting/pushing NaNs to the right rather for each row? — Divakar, Aug 30 '17 at 22:58
This works better if I include a pd.Series, so I don't have to deal with lists: .apply(lambda x : pd.Series(x[x.notnull()].values.tolist()+x[x.isnull()].values.tolist()),1) — zabop, Aug 22 '20 at 20:52

Divakar · Accepted Answer · 2017-08-30T23:11:45.593

8

Here's a NumPy solution using justify -

In [455]: df
Out[455]: 
     0    1    2
0  0.0  1.0  2.0
1  NaN  1.0  2.0
2  NaN  NaN  2.0

In [456]: pd.DataFrame(justify(df.values, invalid_val=np.nan, axis=1, side='left'))
Out[456]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

If you want to save memory, assign it back instead -

df[:] = justify(df.values, invalid_val=np.nan, axis=1, side='left')

edited Aug 30 '17 at 23:11

answered Aug 30 '17 at 22:57

Divakar

218,885
19
262
358

@Wen Would be nice to know the kind of timings you get on your dataset. – Divakar Aug 30 '17 at 23:01
Your solution is faster than others, Thank you for your assistant! – BENY Aug 30 '17 at 23:15
@Wen Would have loved some kind of timing figures on your mentioned `(54812040, 1522)` dataset :) Though I am not sure how you are holding such a huge dataset! – Divakar Aug 30 '17 at 23:16
Yeah, I don't think `sorted` holds a candle to numpy on 54 _**MILLION**_ data points. – cs95 Aug 30 '17 at 23:18
@cᴏʟᴅsᴘᴇᴇᴅ Yup, pandas apply is known for convenience from what little I know about it. – Divakar Aug 30 '17 at 23:24
This is what I get: TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' – zabop Aug 22 '20 at 20:36
The original apply() method in the OP works though :) – zabop Aug 22 '20 at 20:38

cs95 · Answer 2 · 2017-08-30T22:57:31.710

5

Your ~~best~~ easiest option is to use sorted on df.apply/df.transform and sort by nullity.

df = df.apply(lambda x: sorted(x, key=pd.isnull), 1)
df
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

You may also pass np.isnan to the key argument.

edited Aug 30 '17 at 22:57

answered Aug 30 '17 at 22:47

cs95

379,657
97
704
746

I should make it more clear, some time it will be `1 NaN 2 NaN`, still a nice one !:) – BENY Aug 30 '17 at 22:48
@Wen Added another solution. – cs95 Aug 30 '17 at 22:49
1

Dude, you kill it ! I was totally forgot `sorted keys` let me test it with my real data – BENY Aug 30 '17 at 22:51
Hi. This doesn't seem to work with multi-index? It converts my dataframe into a series. – MasayoMusic Sep 02 '19 at 05:25

Shift NaNs to the end of their respective rows

2 Answers2

Linked

Related