4

So for simplicity purposes since my data set is very large, let's say I have a dataframe:

df = pd.DataFrame([['Foo', 'Foo1'], ['Bar', 'Bar2'], ['FooBar', 'FooBar3']],
columns= ['Col_A', 'Col_B'])

I need to filter this dataframe in a way that would eliminate an entire row when a specified column row contains a partial, non case sensitive string (foo). In this case, I tried this to no avail...PS, my regex skills are trash so forgive me if it's not working for that reason.

df = df[df['Col_A'] != '^[Ff][Oo][Oo].*']

Due to the size of my dataset, efficiency is a concern which is why I have not opted for the iteration route. Thanks in advance.

Trace R.
  • 307
  • 3
  • 13
  • @Wiktor Stribiżew the question that you marked as duplicate seems to concern filtering entire columns, rather than the content contained within the columns. – Trace R. Aug 21 '19 at 23:39

2 Answers2

3

Use str.match

df[~df['Col_A'].str.match('^[Ff][Oo][Oo].*')]

result

    Col_A   Col_B
1   Bar     Bar2
pythonic833
  • 3,054
  • 1
  • 12
  • 27
  • This solution is just what I needed and seems to be moldable for other situations I need to do this in. Thank you so much. – Trace R. Aug 21 '19 at 23:48
3

Another method would be too use str.startswith with str.lower and the NOT operator ~:

df[~df['Col_A'].str.lower().str.startswith('foo')]

Output

  Col_A Col_B
1   Bar  Bar2
Erfan
  • 40,971
  • 8
  • 66
  • 78