45

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

I then get the following syntax error message:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
                                                         ^
SyntaxError: invalid syntax

What am I doing wrong?

Amani
  • 16,245
  • 29
  • 103
  • 153
  • Note that for most operations, `apply` is not needed at all. Please use vectorized operations instead. See [here](https://stackoverflow.com/a/73643899/19123103) and [here](https://stackoverflow.com/a/62968313/19123103) for more info. – cottontail Feb 15 '23 at 06:16

2 Answers2

49

You need mask:

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

Another solution with loc and boolean indexing:

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

Sample:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
    PR
0   10
1  100
2   40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
      PR
0    NaN
1  100.0
2    NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
      PR
0    NaN
1  100.0
2    NaN

EDIT:

Solution with apply:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

Timings len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
29

You need to add else in your lambda function. Because you are telling what to do in case your condition(here x < 90) is met, but you are not telling what to do in case the condition is not met.

sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x) 
Ruthger Righart
  • 4,799
  • 2
  • 28
  • 33