1
self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))

How can i a drop those rows where my_map.get(x) returns None.

I am looking for a solution where i do not have to iterate over the column again to drop rows.

Thanks

Raheel
  • 8,716
  • 9
  • 60
  • 102

4 Answers4

4

I think you need dropna, because is possible remove None in first step, by assign to new column create NaNs:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
self.df = self.df.dropna('X')

Or:

self.df = self.df[self.df['X'].notnull()]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I understand this logic, But i am afraid because my one df is 250k rows, and this is a chunk out of 60 million. – Raheel Nov 03 '17 at 13:01
  • 1
    @RaheelKhan - I add another solution, it is a [bit faster](https://stackoverflow.com/a/46091980/2901002) – jezrael Nov 03 '17 at 13:04
  • I suspect that if you ask a different question where you share what you are trying to do with the entire dataframe and what your lambda is, you’d get a much better answer – piRSquared Nov 03 '17 at 13:04
  • I tried with 250k df, didn't make any such difference. Thanks – Raheel Nov 03 '17 at 13:15
  • @piRSquared `my_map` is just a dictionary, where the keys will be `x` i am assigning the value against that key to `X` a new column. But since the data is huge there will be many `x` which will not be matched in my `dict`so i dont want those records in my df. – Raheel Nov 03 '17 at 13:17
  • Jezrael, please check this https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas – Pyd Nov 03 '17 at 13:23
  • @jezrael Now my other transformation function not working. `A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead` – Raheel Nov 03 '17 at 16:17
  • Maybe need copy I believe, `self.df[self.df['X'].notnull()].copy()` – jezrael Nov 03 '17 at 16:37
3

Either loc or pd.Series.compress take a callable argument and return a subset where the callable evaluates to True

compress

self.df['x'].compress(lambda x: my_map.get(x) is not None)

loc

self.df['x'].loc[lambda x: my_map.get(x) is not None]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

You can find the indices as follows

idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X

Full code:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X
self.df = self.df.drop(idxs)
Fabian Ying
  • 1,216
  • 1
  • 10
  • 15
0

You can do this as a merge, if you convert your mymap to a dict:

mymerge = pd.DataFrame.from_dict(mymap, orient = 'index')

Then use a left join, to only join on the required columns:

mymerge.merge(df, left_index = True, right_on = 'x')

In one line:

pd.DataFrame.from_dict(mymap, orient = 'index').merge(df, left_index = True, right_on = 'x')
jeremycg
  • 24,657
  • 5
  • 63
  • 74