I'd like to search for a list of keywords in a text column and select all rows where the exact keywords exist. I know this question has many duplicates, but I can't understand why the solution is not working in my case.
keywords = ['fake', 'false', 'lie']
df1:
text | |
---|---|
19152 | I think she is the Corona Virus.... |
19154 | Boy you hate to see that. I mean seeing how it was contained and all. |
19155 | Tell her it’s just the fake flu, it will go away in a few days. |
19235 | Is this fake news? |
... | ... |
20540 | She’ll believe it’s just alternative facts. |
Expected results: I'd like to select rows that have the exact keywords in my list ('fake', 'false', 'lie). For example, in the above df, it should return rows 19155 and 19235.
str.contains()
df1[df1['text'].str.contains("|".join(keywords))]
The problem with str.contains()
is that the result is not limited to the exact keywords. For example, it returns sentences with believe
(e.g., row 20540) because lie
is a substring of "believe"!
pandas.Series.isin
To find the rows including the exact keywords, I used pd.Series.isin:
df1[df1.text.isin(keywords)]
#df1[df1['text'].isin(keywords)]
Even though I see there are matches in df1, it doesn't return anything.