How can I drop rows in a dataframe efficiently ir a specific column contains a substring

Question

I tried

df = df[~df['event.properties.comment'].isin(['Extra'])]

Problem is it would just drop the row if the column contains exactly 'Extra' and I need to drop the ones that contain it even as a substring.

Any help?

Naga kiran · Answer 1 · 2018-09-12T17:37:52.243

0

You can use or condition to have multiple conditions in checking string, for your requirement you may retain text if it have "Extra" or "~". Considered df

    vals    ids
0   1   ~
1   2   bball
2   3   NaN
3   4   Extra text

df[~df.ids.fillna('').str.contains('Extra')]

Out:

    vals    ids
0   1   ~
1   2   bball
2   3   NaN

edited Sep 12 '18 at 17:37

answered Sep 12 '18 at 17:08

Naga kiran

4,528
1
17
31

I have 2 problems with this: 1. NaN values, 2. When use ~ (I need to keep the ones that don´t contain the word) this error appears: bad operand type for unary ~: 'float' – Adrian Torrejón Sep 12 '18 at 17:12
1

@AdrianTorrejón Sounds like your column has nulls in it. Pandas treats all nulls as NaNs (which are always floats because pandas is currently built on top of numpy). You will have to pick a suitable string mapping of your NaNs. – PMende Sep 12 '18 at 17:15
1

The OP wants to drop the rows containing extra, your solution is keeping those rows and dropping the rest – Vaishali Sep 12 '18 at 17:31
Thanks, works perfectly! Any way to make it not sensitive to upper-lower case? – Adrian Torrejón Sep 12 '18 at 17:44
1

df[~df.ids.fillna('').str.contains('Extra',case=False)] – Pyd Sep 12 '18 at 17:48

How can I drop rows in a dataframe efficiently ir a specific column contains a substring

1 Answers1