Pandas Lambda Function with Nan Support

Question

I am trying to write a lambda function in Pandas that checks to see if Col1 is a Nan and if so, uses another column's data. I have having trouble getting code (below) to compile/execute correctly.

import pandas as pd
import numpy as np
df=pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [7, 8, 9, 10]})  
df2=df.apply(lambda x: x['Col2'] if x['Col1'].isnull() else x['Col1'], axis=1)

Does anyone have any good idea on how to write a solution like this with a lambda function or have I exceeded the abilities of lambda? If not, do you have another solution? Thanks.

Your example only has one column. You can't draw from `Col2` if there isn't a `Col2` in your dataset; further, in general, you can't get "another column's data" if there isn't any other column. — Arya McCarthy, May 19 '17 at 04:44
Possible duplicate of [Pandas - FillNa with another column](http://stackoverflow.com/questions/30357276/pandas-fillna-with-another-column) — Arya McCarthy, May 19 '17 at 05:21
@aryamccarthy Apologize. I should have made an arbitrary 'Col2'. I'll test and come back. — Tyler Russell, May 19 '17 at 05:37
This doesn't seem to work on my large DataFrame, but does work in an example. Could this be because in my actual data set has a different data type, so fillna won't work correctly? Both Col1 and Col2 in my actual set are dtype('O') so it shouldn't be a problem. — Tyler Russell, May 19 '17 at 13:51

score 44 · Accepted Answer · answered May 19 '17 at 05:10

44

You need pandas.isnull for check if scalar is NaN:

df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN],
                 'Col2' : [8,9,7,10]})  

df2 = df.apply(lambda x: x['Col2'] if pd.isnull(x['Col1']) else x['Col1'], axis=1)

print (df)
   Col1  Col2
0   1.0     8
1   2.0     9
2   3.0     7
3   NaN    10

print (df2)
0     1.0
1     2.0
2     3.0
3    10.0
dtype: float64

But better is use Series.combine_first:

df['Col1'] = df['Col1'].combine_first(df['Col2'])

print (df)
   Col1  Col2
0   1.0     8
1   2.0     9
2   3.0     7
3  10.0    10

Another solution with Series.update:

df['Col1'].update(df['Col2'])
print (df)
   Col1  Col2
0   8.0     8
1   9.0     9
2   7.0     7
3  10.0    10

answered May 19 '17 at 05:10

jezrael

822,522
95
1,334
1,252

Thanks. Did you mean for your else in your first lambda method to be Col1 or Col2? – Tyler Russell May 19 '17 at 05:42
1

Hmmm, I think it is `Col2` - it means get value of col2 if condition is True else get value col1 – jezrael May 19 '17 at 05:44
1

But I prefer another solutions if need replace NaNs tby another column. – jezrael May 19 '17 at 05:45
Your first two methods work flawlessly. Just out of curiosity, why do you think it's better to use Series.combine_first rather than a lambda function on the df? – Tyler Russell May 19 '17 at 14:03
Because it is faster and vectorized function. But if small dataframe (100 rows), it is no problem. but if `1M` rows, there is huge difference. – jezrael May 19 '17 at 14:04
Thanks--appreciate your replies! Still learning. On my actual data, the lambda function was an order of magnitude slower (8s vs. 0.8s). – Tyler Russell May 19 '17 at 14:43
Thank you for comment. `apply` is obviusly used if no pandas function and need write it. Good luck! – jezrael May 19 '17 at 14:46

score 5 · Answer 2 · answered May 19 '17 at 04:48

5

Assuming that you do have a second column, that is:

df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [1,2,3,4]})

The correct solution to this problem would be:

df['Col1'].fillna(df['Col2'], inplace=True)

answered May 19 '17 at 04:48

Gerges

6,269
2
22
44

Apologize. I should have made an arbitrary 'Col2'. I'll test and come back. – Tyler Russell May 19 '17 at 05:39
This doesn't seem to work on my large DataFrame, but does work in an example. Could this be because in my actual data set has a different data type, so fillna won't work correctly? Both Col1 and Col2 in my actual set are dtype('O') so it shouldn't be a problem. – Tyler Russell May 19 '17 at 13:52
Works for me with object data types also. What's the issue when you use the actual dataset? – Gerges May 19 '17 at 16:43

score 3 · Answer 3 · answered May 19 '17 at 04:59

3

You need to use np.nan()

#import numpy as np
df2=df.apply(lambda x: 2 if np.isnan(x['Col1']) else 1, axis=1)   

df2
Out[1307]: 
0    1
1    1
2    1
3    2
dtype: int64

answered May 19 '17 at 04:59

Allen Qin

19,507
8
51
67

I was trying to round non-NaN values, and this worked whilst `x is np.NaN` didn't: `df.age.apply(lambda x: x if np.isnan(x) else round(x))` – Rafs Jul 19 '23 at 09:22

David Wei · Answer 4 · 2022-01-06T08:05:35.830

Within pandas 0.24.2, I use

df.apply(lambda x: x['col_name'] if x[col1] is np.nan else expressions_another, axis=1)

because pd.isnull() doesn't work.

in my work,I found the following phenomenon,

No running results:

df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if pd.isnull(x['cnumpday']) else np.nan, axis=1)

Results exist:

df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if x['cnumpday'] is not np.nan else np.nan, axis=1)

So far, I still don't know the deeper reason, but I have these experiences, for object, use [is np.nan()] or pd.isna(). For a float, use np.isnan() or pd.isna().

Pandas Lambda Function with Nan Support

4 Answers4

Linked