Drop row if lambda returns None Pandas

Question

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))

How can i a drop those rows where my_map.get(x) returns None.

I am looking for a solution where i do not have to iterate over the column again to drop rows.

Thanks

Does this means first i ```apply``` and then run ```dropna``` ? — Raheel, Nov 03 '17 at 12:54
This sounds like you might be better served doing a left join from a dataframe made from mymap? — jeremycg, Nov 03 '17 at 13:08

jezrael · Accepted Answer · 2017-11-03T13:04:32.073

4

I think you need dropna, because is possible remove None in first step, by assign to new column create NaNs:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
self.df = self.df.dropna('X')

Or:

self.df = self.df[self.df['X'].notnull()]

edited Nov 03 '17 at 13:04

answered Nov 03 '17 at 13:00

jezrael

822,522
95
1,334
1,252

I understand this logic, But i am afraid because my one df is 250k rows, and this is a chunk out of 60 million. – Raheel Nov 03 '17 at 13:01
1

@RaheelKhan - I add another solution, it is a [bit faster](https://stackoverflow.com/a/46091980/2901002) – jezrael Nov 03 '17 at 13:04
I suspect that if you ask a different question where you share what you are trying to do with the entire dataframe and what your lambda is, you’d get a much better answer – piRSquared Nov 03 '17 at 13:04
I tried with 250k df, didn't make any such difference. Thanks – Raheel Nov 03 '17 at 13:15
@piRSquared `my_map` is just a dictionary, where the keys will be `x` i am assigning the value against that key to `X` a new column. But since the data is huge there will be many `x` which will not be matched in my `dict`so i dont want those records in my df. – Raheel Nov 03 '17 at 13:17
Jezrael, please check this https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas – Pyd Nov 03 '17 at 13:23
@jezrael Now my other transformation function not working. `A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead` – Raheel Nov 03 '17 at 16:17
Maybe need copy I believe, `self.df[self.df['X'].notnull()].copy()` – jezrael Nov 03 '17 at 16:37

score 3 · Answer 2 · answered Nov 03 '17 at 12:58

3

Either loc or pd.Series.compress take a callable argument and return a subset where the callable evaluates to True

compress

self.df['x'].compress(lambda x: my_map.get(x) is not None)

loc

self.df['x'].loc[lambda x: my_map.get(x) is not None]

answered Nov 03 '17 at 12:58

piRSquared

285,575
57
475
624

So need `self.df = self.df.dropna('X')` – jezrael Nov 03 '17 at 12:59
@jezrael thinking – piRSquared Nov 03 '17 at 13:01
can you please check this one https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas – Pyd Nov 03 '17 at 13:23

Fabian Ying · Answer 3 · 2017-11-03T13:01:33.747

1

You can find the indices as follows

idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X

Full code:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X
self.df = self.df.drop(idxs)

edited Nov 03 '17 at 13:01

answered Nov 03 '17 at 12:53

Fabian Ying

1,216
1
10
15

do you think `self.df = self.df.dropna('X')` this will be more optimized way ? – Raheel Nov 03 '17 at 12:57
1

`self.df['X'] == None` return False :( – jezrael Nov 03 '17 at 13:00
@jezrael Yes, you're right. You can use .isnull(), but I think `self.df.dropna('X')` is a cleaner solution. – Fabian Ying Nov 03 '17 at 13:02

score 0 · Answer 4 · answered Nov 03 '17 at 13:26

You can do this as a merge, if you convert your mymap to a dict:

mymerge = pd.DataFrame.from_dict(mymap, orient = 'index')

Then use a left join, to only join on the required columns:

mymerge.merge(df, left_index = True, right_on = 'x')

In one line:

pd.DataFrame.from_dict(mymap, orient = 'index').merge(df, left_index = True, right_on = 'x')

Drop row if lambda returns None Pandas

4 Answers4