Deleting data in pandas given a string condition

Question

I am having trouble understanding the mechanics here given the following.

I have a dataframe reading from a .csv :

  a1 b1 c1
1 aa bb cc
2 ab ba ca 

df.drop(df['a1'].str.contains('aa',case = False))

I want to drop all the rows in column a1 that contain 'aa'

I believe to have attempted everything on here but still get the :

ValueError: labels [False False False ... False False False] not contained in axis

Yes, I have also tried

skipinitialspace=True
axis=1

Any help would be appreciated, thank you.

`df[~df.a1.str.contains('aa')] ` – BENY May 14 '18 at 18:12 — BENY, May 14 '18 at 18:12

score 6 · Accepted Answer · edited May 14 '18 at 20:39

6

str.contains returns a mask:

df['a1'].str.contains('aa',case = False)

1     True
2    False
Name: a1, dtype: bool

However, drop accepts index labels, not boolean masks. If you open up the help on drop, you may observe this first-hand:

?df.drop

Signature: df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Docstring:
Return new object with labels in requested axis removed.

Parameters
----------
labels : single label or list-like
    Index or column labels to drop.

You could figure out the index labels from the mask and pass those to drop

idx = df.index[df['a1'].str.contains('aa')]
df.drop(idx)

   a1  b1  c1
2  ab  ba  ca

However, this is too windy, so I'd recommend just sticking to the pandaic method of dropping rows based on conditions, boolean indexing:

df[~df['a1'].str.contains('aa')]

   a1  b1  c1
2  ab  ba  ca

If anyone is interested in removing rows that contain strings in a list

df = df[~df['a1'].str.contains('|'.join(my_list))]

Make sure to strip white spaces. Credit to https://stackoverflow.com/a/45681254/9500464

edited May 14 '18 at 20:39

satoshi

439
3
14

answered May 14 '18 at 18:12

cs95

379,657
97
704
746

2

A trivial speed improvement, if applicable, is to set `regex=False`. – jpp May 14 '18 at 18:16
1

So you're recommending to drop the mask, basically :-) – Ami Tavory May 14 '18 at 18:17
Thank you for this! I am still so confused I really need to review this in depth. – satoshi May 14 '18 at 18:17
1

@g_altobelli it's pretty straightforward, it needs to know the index labels, because it will remove those. It doesn't accept a boolean mask because it doesn't _need_ to accept one (there's enough indexers that do that already, `__getitem__` and `loc` are two of them). – cs95 May 14 '18 at 18:20
@cᴏʟᴅsᴘᴇᴇᴅ I just got it thank you!!! This makes so much more sense now. I am not sure why I made this so complicated for myself. – satoshi May 14 '18 at 18:22

Deleting data in pandas given a string condition

1 Answers1