2

I am working on Twitter data and trying to find strings that contain more than one word. The following line works for one word and with the OR condition.

tweets_text[tweets_text.str.contains("break")] #Find strings with the word break

tweets_text[tweets_text.str.contains("break|social|media")] #Find strings with either break or social, or media

I am trying to find the strings that have these three words ("break & social & media")

smci
  • 32,567
  • 20
  • 113
  • 146
Anas Baheh
  • 137
  • 2
  • 13
  • Do you care about the order in which 'break','social','media' can occur? There are 3! = 6 possible orders, in theory. Could 'break' occur between 'social' and 'media'? – smci Jun 17 '21 at 10:03
  • Related: [Select by partial string from a pandas DataFrame](https://stackoverflow.com/questions/11350770/select-by-partial-string-from-a-pandas-dataframe) – smci Jun 17 '21 at 10:06

3 Answers3

3
df = pd.Series(['break', 'break media social', 'break media'])

Series:

0                 break
1    break media social
2           break media

extraciton:

tweets_text[tweets_text.str.contains('(?=.*break)(?=.*social)(?=.*media)')]

output:

1    break media social
MAFiA303
  • 1,157
  • 11
  • 10
1

You can split them up like this:

tweets_text.loc[tweets_text.str.contains("break") & tweets_text.str.contains("social") & tweets_text.str.contains("media")]
Rutger
  • 593
  • 5
  • 11
0

You can always add some additional parameters to ignore uppercase or lowercase letters, using flags. Using @Rutger 's code. Check the documentation for some additional parameters.

tweets_text.loc[tweets_text.str.contains("break", flags = re.IGNORECASE) & tweets_text.str.contains("social") & tweets_text.str.contains("media", flags = re.IGNORECASE)]

In addition to that you can do the same things by combining lambda function and all, as follows:

def find_words(data, list_of_words):
    function = lambda row: all(word.lower() in row.lower() 
                               for word in list_of_words)

    return data.loc[data[column_name].apply(function)]
Paschalis Ag
  • 128
  • 6