I have a Pandads Dataframe where one column ('processed') is a single string containing some pre-processed text of varying length.
I want to search using a list of keywords, of arbitary length, to return only the processed notes for rows where the string 'processed' contains ALL of the elements in the list.
Of course, I can search the terms individually, like:
words = ['searchterm1', 'searchterm2']
notes = df.loc[(df.processed.str.contains(words[0])) & (df.processed.str.contains(words[1]))].processed
But this seems inefficient, and would require different code depending on the number of search terms I'm using.
What I'm looking for is something like....
notes = (df.loc[[(df.processed.str.contains(words[i])) for i in range(len(words))]]).processed
Which would include
"searchterm1 foo bar searchterm"
but NOT include
"foo bar searchterm1"
or
"searchterm2"
.
But this doesn't work - loc doesn't support the generator object or list as input.
So what is the best way to find a string that contains multiple substrings? Thanks!