Consider:
array = ['... , '...' , '.... ' ,....]
results = df[df['Message'].str.contains('|'.join(array)).fillna(False)]
How can we force the str.contains
to use only WHOLE WORDS from array ?
Consider:
array = ['... , '...' , '.... ' ,....]
results = df[df['Message'].str.contains('|'.join(array)).fillna(False)]
How can we force the str.contains
to use only WHOLE WORDS from array ?
You'll need wrapping all words (w1|w2|w3) to match against any words in the array. Then add a word boundary, \b
, in both side with an escape.
pattern = '\\b(' + '|'.join(arr) + ')\\b'
df[df['Message'].str.contains(pattern).fillna(False)]
Now since I added the extract group ()
, contains will produce a warning.
UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
To handle this warning, change contains
to match
.
df[df['Message'].str.match(pattern).fillna(False)]