2

I am having trouble understanding the mechanics here given the following.

I have a dataframe reading from a .csv :

  a1 b1 c1
1 aa bb cc
2 ab ba ca 

df.drop(df['a1'].str.contains('aa',case = False))

I want to drop all the rows in column a1 that contain 'aa'

I believe to have attempted everything on here but still get the :

ValueError: labels [False False False ... False False False] not contained in axis

Yes, I have also tried

skipinitialspace=True
axis=1

Any help would be appreciated, thank you.

N8888
  • 670
  • 2
  • 14
  • 20
satoshi
  • 439
  • 3
  • 14

1 Answers1

6

str.contains returns a mask:

df['a1'].str.contains('aa',case = False)

1     True
2    False
Name: a1, dtype: bool

However, drop accepts index labels, not boolean masks. If you open up the help on drop, you may observe this first-hand:

?df.drop

Signature: df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Docstring:
Return new object with labels in requested axis removed.

Parameters
----------
labels : single label or list-like
    Index or column labels to drop.

You could figure out the index labels from the mask and pass those to drop

idx = df.index[df['a1'].str.contains('aa')]
df.drop(idx)

   a1  b1  c1
2  ab  ba  ca

However, this is too windy, so I'd recommend just sticking to the pandaic method of dropping rows based on conditions, boolean indexing:

df[~df['a1'].str.contains('aa')]

   a1  b1  c1
2  ab  ba  ca

If anyone is interested in removing rows that contain strings in a list

df = df[~df['a1'].str.contains('|'.join(my_list))]

Make sure to strip white spaces. Credit to https://stackoverflow.com/a/45681254/9500464

satoshi
  • 439
  • 3
  • 14
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 2
    A trivial speed improvement, if applicable, is to set `regex=False`. – jpp May 14 '18 at 18:16
  • 1
    So you're recommending to drop the mask, basically :-) – Ami Tavory May 14 '18 at 18:17
  • Thank you for this! I am still so confused I really need to review this in depth. – satoshi May 14 '18 at 18:17
  • 1
    @g_altobelli it's pretty straightforward, it needs to know the index labels, because it will remove those. It doesn't accept a boolean mask because it doesn't _need_ to accept one (there's enough indexers that do that already, `__getitem__` and `loc` are two of them). – cs95 May 14 '18 at 18:20
  • @cᴏʟᴅsᴘᴇᴇᴅ I just got it thank you!!! This makes so much more sense now. I am not sure why I made this so complicated for myself. – satoshi May 14 '18 at 18:22