Filter dataframe based on string within column

Question

So for simplicity purposes since my data set is very large, let's say I have a dataframe:

df = pd.DataFrame([['Foo', 'Foo1'], ['Bar', 'Bar2'], ['FooBar', 'FooBar3']],
columns= ['Col_A', 'Col_B'])

I need to filter this dataframe in a way that would eliminate an entire row when a specified column row contains a partial, non case sensitive string (foo). In this case, I tried this to no avail...PS, my regex skills are trash so forgive me if it's not working for that reason.

df = df[df['Col_A'] != '^[Ff][Oo][Oo].*']

Due to the size of my dataset, efficiency is a concern which is why I have not opted for the iteration route. Thanks in advance.

@Wiktor Stribiżew the question that you marked as duplicate seems to concern filtering entire columns, rather than the content contained within the columns. — Trace R., Aug 21 '19 at 23:39

score 3 · Accepted Answer · answered Aug 21 '19 at 22:31

3

Use str.match

df[~df['Col_A'].str.match('^[Ff][Oo][Oo].*')]

result

    Col_A   Col_B
1   Bar     Bar2

answered Aug 21 '19 at 22:31

pythonic833

3,054
1
12
27

This solution is just what I needed and seems to be moldable for other situations I need to do this in. Thank you so much. – Trace R. Aug 21 '19 at 23:48

score 3 · Answer 2 · answered Aug 21 '19 at 22:37

3

Another method would be too use str.startswith with str.lower and the NOT operator ~:

df[~df['Col_A'].str.lower().str.startswith('foo')]

Output

  Col_A Col_B
1   Bar  Bar2

answered Aug 21 '19 at 22:37

Erfan

40,971
8
66
78

Filter dataframe based on string within column

2 Answers2