How to Delete Rows That Contains Some Text in Pandas

Question

I have a dataframe where I want to drop some rows in my that contain some text.

Date      Campaign
3/24/20   GA Shoes Search Campaign
3/24/20   GA Shoes Display Campaign
3/24/20   GA Bag Search Campaign
3/24/20   GA Bag Display Campaign
3/24/20   IG Shoes Campaign
3/24/20   IG Bag Campaign
3/24/20   FB Shoes Campaign
3/24/20   FB Bag Campaign
3/24/20   Email Campaign

I want to delete all rows and leave the rows that contain GA. Here's the result I want:

Date      Campaign
3/24/20   GA Shoes Search Campaign
3/24/20   GA Shoes Display Campaign
3/24/20   GA Bag Search Campaign
3/24/20   GA Bag Display Campaign

I tried to do this:

mask = df['Campaign'].str.contains('FB')
idx = df.index[mask]
new = df.drop(idx,axis=0)

However, it only works if I do one string at a time. I tried to do this to save time, but it didn't work:

mask = df['Campaign'].str.contains('FB', 'Email', 'IG')
idx = df.index[mask]
new = df.drop(idx,axis=0)

score 4 · Accepted Answer · answered Aug 21 '20 at 21:22

4

Rather than dropping the rows that contain the other letters, you could just apply a function to grab the rows that do contain 'GA' :

new = df[df['Campaign'].apply(lambda x: 'GA' in x)]

answered Aug 21 '20 at 21:22

SimonR

1,774
1
4
10

score 3 · Answer 2 · answered Aug 21 '20 at 21:28

3

The assumption here is that for all relevant rows, GA is at the start of the sentence. Pandas str startswith can help here:

df.loc[df.Campaign.str.startswith("GA")]

    Date    Campaign
0   3/24/20 GA Shoes Search Campaign
1   3/24/20 GA Shoes Display Campaign
2   3/24/20 GA Bag Search Campaign
3   3/24/20 GA Bag Display Campaign

If however, GA might be embedded within the sentence and not at the very beginning, it would be helpful if you provided data similar to that. That way, it can be determined if GA is within words, or on its own, or whatever, and hopefully find a solution that fits

answered Aug 21 '20 at 21:28

sammywemmy

27,093
4
17
31

Hi sammywemmy, what if GA has other variations like ZGA, can I do df.loc[df.Campaign.str.startswith("GA", "ZGA)]? Or would it only work one at time? – ssmm Aug 21 '20 at 21:41
1

@ssmm, for multiple entries in `startswith`, use a tuple : `df.loc[df.Campaign.str.startswith(("GA", "ZGA"))]` – sammywemmy Aug 21 '20 at 21:43

zabop · Answer 3 · 2020-08-21T21:39:42.947

Setup example dataframe

If you have a dataframe:

df = pd.DataFrame({'x': ['A0', 'A1', 'B2', 'A3'],
                   'y': ['B0', 'B1', 'B2', 'B3'],
                   'z': ['A0', 'C1', 'C2', 'C3'],
                   'w': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])

Which looks like:

and let's say, you want to create those rows which contain A in column x.

Methods

`str.contains`:

You can do:

df[df['x'].str.contains('A')]

List comprehension

df[['A' in each for each in df['x']]] will suffice.

`apply()`:

If you are into apply(), can do:

df[df['x'].apply(lambda x: 'A' in x)]

Results

All of these methods will give you:

Final notes More generally speaking:

The str.contains method:

df[df[name_of_column_which_should_contain_something].str.contains(what_should_it_contain)]

The list comprehension method:

df[[what_to_search_for in each for each in df[whichcolumn]]]
The apply() method:

df[df[which_column_to_search_in].apply(lambda x: what_to_search_for in x)]

I got an error that said: "unsupported operand type(s) for &: 'str' and 'int'". Here's the code: df1[df1['Order'].str.contains('FB','Email', 'IG')] — ssmm, Aug 21 '20 at 21:32
Hmm, for which method did you get that error? (Maybe try the others?) — zabop, Aug 21 '20 at 21:34