I am parsing a pandas dataframe df1
containing string object rows. I have a reference list of keywords and need to delete every row in df1
containing any word from the reference list.
Currently, I do it like this:
reference_list: ["words", "to", "remove"]
df1 = df1[~df1[0].str.contains(r"words")]
df1 = df1[~df1[0].str.contains(r"to")]
df1 = df1[~df1[0].str.contains(r"remove")]
Which is not not scalable to thousands of words. However, when I do:
df1 = df1[~df1[0].str.contains(reference_word for reference_word in reference_list)]
I yield the error first argument must be string or compiled pattern.
Following this solution, I tried:
reference_list: "words|to|remove"
df1 = df1[~df1[0].str.contains(reference_list)]
Which doesn't raise an exception but doesn't parse all words eather.
How to effectively use str.contains with a list of words?