Lambda apply include NaN values in new column

Question

I am creating a new column in a dataframe based on a conditional function. In the column I am mapping from, there are multiple NaN values. If a NaN value is present in the original column, I also want this to be present in my new column. As an example, my starting point is this:

Here is a sample of the code I initially run which gives the following result (clearly):

def get_value(range):
    if range < 2:
        return 'Below 2'
    elif range < 8:
        return 'Between 2 and 8'
    else:
        return 'Above 8'

df_sample['new_col'] = df_sample.apply(lambda x: get_value(x['Original']), axis=1)

Original    new_col
0   1.0 Below 2
1   2.0 Between 2 and 8
2   3.0 Between 2 and 8
3   4.0 Between 2 and 8
4   5.0 Between 2 and 8
5   6.0 Between 2 and 8
6   NaN Above 8
7   8.0 Above 8
8   9.0 Above 8
9   10.0    Above 8

Here, index 6 should show NaN.

I have tried including elif range == np.Nan: in my function but this didn't work.

I then tried the following based on a recommendation on Stackoverflow:

df_sample['new_col'] = df_sample.apply(lambda x: get_value(x) if(np.all(pd.notnull(x['Original']))) else x, axis = 1)

But this returns an error at the first NaN index in my dataframe.

This is not a recommended way of doing this, but if you insist then you could add `if np.isnan(range): return np.nan` condition — Chris Adams, Mar 03 '20 at 16:55
@ChrisA do you mind explaining why this is not recommended? Thanks a lot. — ojp, Mar 03 '20 at 21:06

Chris Adams · Accepted Answer · 2020-03-03T17:10:48.550

Déjà vu here, but following on from my last solution, just add default for where no condition is met:

import numpy as np 

condlist = [
    df['Original'].lt(2),
    df['Original'].lt(8),
    df['Original'].ge(8)]

choicelist = ['Below 2', 'Between 2 and 8', 'Above 8']

df['new_col'] = np.select(condlist, choicelist, default=np.nan)
print(df)

[out]

   Original          new_col
0       1.0          Below 2
1       2.0  Between 2 and 8
2       3.0  Between 2 and 8
3       4.0  Between 2 and 8
4       5.0  Between 2 and 8
5       6.0  Between 2 and 8
6       NaN              nan
7       8.0          Above 8
8       9.0          Above 8
9      10.0          Above 8

Thanks Chris, sorry I got stuck with this and thought it would be best to create a new question — ojp, Mar 03 '20 at 17:10

Quang Hoang · Answer 2 · 2020-03-03T17:06:14.347

1

In general, do not use apply. In thise case, cut is a much better choice:

pd.cut(df.Original, [-np.inf, 2, 8, np.inf],
       labels = ['below 2', 'between 2 and 8', 'above 8'],
       right=False)

Output:

0            below 2
1    between 2 and 8
2    between 2 and 8
3    between 2 and 8
4    between 2 and 8
5    between 2 and 8
6                NaN
7            above 8
8            above 8
9            above 8
Name: Original, dtype: category
Categories (3, object): [below 2 < between 2 and 8 < above 8]

edited Mar 03 '20 at 17:06

answered Mar 03 '20 at 16:52

Quang Hoang

146,074
10
56
74

I've already suggested [this](https://stackoverflow.com/a/60510900/10201580)... don't think he's interested – Chris Adams Mar 03 '20 at 17:02

dantarno · Answer 3 · 2020-03-03T17:01:11.077

To comment on your code :

As you're using the else statement, everything not bellow 8 will be display as 'above 8'. Even if in your original dataset you have the string 'hello world'.

To stay as simple as your code you could do :

def get_value(range):
    if range < 2:
        return 'Below 2'
    elif range < 8:
        return 'Between 2 and 8'
    elif range >= 8:
        return 'Above 8'
    else:
        return np.nan

Lambda apply include NaN values in new column

3 Answers3