0

Consider:

array = ['... ,  '...'  , '.... ' ,....]
results = df[df['Message'].str.contains('|'.join(array)).fillna(False)]

How can we force the str.contains to use only WHOLE WORDS from array ?

JAN
  • 21,236
  • 66
  • 181
  • 318
  • have a look at [python-pandas-series-str-contains-whole-word](https://stackoverflow.com/questions/39359601/python-pandas-series-str-contains-whole-word/39359789) – Anurag Dabas Jul 14 '21 at 16:45
  • @AnuragDabas: Yeah ,tried this `pattern = '\b' + '|'.join(array) + '\b' results = df[df['Message'].str.contains(pattern).fillna(False)]` , but is doesn't work. – JAN Jul 14 '21 at 16:50
  • 1
    Try escaping the `\b` and also wrap strings with () like this: `pattern = '\\b(' + '|'.join(arr) + ')\\b'`. `match` works better instead of `contains` since now it produces a warning. – Emma Jul 14 '21 at 17:15
  • @Emma: Make it as an answer and I'll choose it! – JAN Jul 14 '21 at 17:18

1 Answers1

4

You'll need wrapping all words (w1|w2|w3) to match against any words in the array. Then add a word boundary, \b, in both side with an escape.

pattern = '\\b(' + '|'.join(arr) + ')\\b'
df[df['Message'].str.contains(pattern).fillna(False)]

Now since I added the extract group (), contains will produce a warning.

UserWarning: This pattern has match groups. To actually get the groups, use str.extract.

To handle this warning, change contains to match.

df[df['Message'].str.match(pattern).fillna(False)]
Emma
  • 8,518
  • 1
  • 18
  • 35
  • ``str.contains`` produced the expected results for me, whereas ``str.match`` failed to find anything.. all search from ``str.match`` came back as empty – StuckInPhDNoMore Oct 06 '22 at 10:11