0

I am trying to filter rows in dataframe by multiple strings and I have searched and found this

search_values = ['vba','google']
df[df[0].str.contains('|'.join(search_values), case=False)]

But this I think based on finding either of the two strings vba or google. How can I join the both strings to be used AND not OR. I mean that the filter should be done if both the strings are there in the column so if this sentence for example I mean vba will be with google in one sentence. This row would be selected because it has both vba and google

YasserKhalil
  • 9,138
  • 7
  • 36
  • 95

1 Answers1

1

The contains function uses regex to find the rows that match the string. Using | is like OR in regex. If you want rows that contain both, this should work:

search_values = ['vba','google']
df[df[0].str.contains(f'{search_values[0]}.*{search_values[1]}|{search_values[1]}.*{search_values[0]}', case=False)]

The .* part means that any string can be found between the searched terms. So we try to find rows with 'vba', some string and then 'google', or rows with 'google', some string and then 'vba'.

Daniel Geffen
  • 1,777
  • 1
  • 11
  • 16
  • Amazing. Thanks a lot. How can I add more terms to the list search_values? Is there an easy way? – YasserKhalil Dec 05 '20 at 20:42
  • There are some other ways to do this that might be easier while searching for multiple terms. You can look at this question for example: https://stackoverflow.com/questions/4389644/regex-to-match-string-containing-two-names-in-any-order – Daniel Geffen Dec 05 '20 at 20:53
  • Thanks a lot. I am lost in the link you posted. – YasserKhalil Dec 05 '20 at 20:57