1

I have a dataframe where I want to drop some rows in my that contain some text.

Date      Campaign
3/24/20   GA Shoes Search Campaign
3/24/20   GA Shoes Display Campaign
3/24/20   GA Bag Search Campaign
3/24/20   GA Bag Display Campaign
3/24/20   IG Shoes Campaign
3/24/20   IG Bag Campaign
3/24/20   FB Shoes Campaign
3/24/20   FB Bag Campaign
3/24/20   Email Campaign

I want to delete all rows and leave the rows that contain GA. Here's the result I want:

Date      Campaign
3/24/20   GA Shoes Search Campaign
3/24/20   GA Shoes Display Campaign
3/24/20   GA Bag Search Campaign
3/24/20   GA Bag Display Campaign

I tried to do this:

mask = df['Campaign'].str.contains('FB')
idx = df.index[mask]
new = df.drop(idx,axis=0)

However, it only works if I do one string at a time. I tried to do this to save time, but it didn't work:

mask = df['Campaign'].str.contains('FB', 'Email', 'IG')
idx = df.index[mask]
new = df.drop(idx,axis=0)
ssmm
  • 149
  • 2
  • 15

3 Answers3

4

Rather than dropping the rows that contain the other letters, you could just apply a function to grab the rows that do contain 'GA' :

new = df[df['Campaign'].apply(lambda x: 'GA' in x)]

SimonR
  • 1,774
  • 1
  • 4
  • 10
3

The assumption here is that for all relevant rows, GA is at the start of the sentence. Pandas str startswith can help here:

df.loc[df.Campaign.str.startswith("GA")]

    Date    Campaign
0   3/24/20 GA Shoes Search Campaign
1   3/24/20 GA Shoes Display Campaign
2   3/24/20 GA Bag Search Campaign
3   3/24/20 GA Bag Display Campaign

If however, GA might be embedded within the sentence and not at the very beginning, it would be helpful if you provided data similar to that. That way, it can be determined if GA is within words, or on its own, or whatever, and hopefully find a solution that fits

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • Hi sammywemmy, what if GA has other variations like ZGA, can I do df.loc[df.Campaign.str.startswith("GA", "ZGA)]? Or would it only work one at time? – ssmm Aug 21 '20 at 21:41
  • 1
    @ssmm, for multiple entries in `startswith`, use a tuple : `df.loc[df.Campaign.str.startswith(("GA", "ZGA"))]` – sammywemmy Aug 21 '20 at 21:43
1

Setup example dataframe

If you have a dataframe:

df = pd.DataFrame({'x': ['A0', 'A1', 'B2', 'A3'],
                   'y': ['B0', 'B1', 'B2', 'B3'],
                   'z': ['A0', 'C1', 'C2', 'C3'],
                   'w': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])

Which looks like:

enter image description here

and let's say, you want to create those rows which contain A in column x.


Methods

str.contains:

You can do:

df[df['x'].str.contains('A')]

List comprehension

df[['A' in each for each in df['x']]] will suffice.


apply():

If you are into apply(), can do:

df[df['x'].apply(lambda x: 'A' in x)]

Results

All of these methods will give you:

enter image description here


Final notes More generally speaking:

df[df[name_of_column_which_should_contain_something].str.contains(what_should_it_contain)]

  • The list comprehension method:

    df[[what_to_search_for in each for each in df[whichcolumn]]]

  • The apply() method:

    df[df[which_column_to_search_in].apply(lambda x: what_to_search_for in x)]

zabop
  • 6,750
  • 3
  • 39
  • 84
  • I got an error that said: "unsupported operand type(s) for &: 'str' and 'int'". Here's the code: df1[df1['Order'].str.contains('FB','Email', 'IG')] – ssmm Aug 21 '20 at 21:32
  • Hmm, for which method did you get that error? (Maybe try the others?) – zabop Aug 21 '20 at 21:34