I am trying to return rows if a column contains a line break and specific word following it. So '\nWord'.
Here is a minimal example
testdf = pd.DataFrame([['test1', ' generates the final summary. \nRESULTS We evaluate the performance of ', ], ['test2', 'the cat and bat \n\n\nRESULTS\n teamed up to find some food'], ['test2' , 'anthropology with RESULTS pharmacology and biology']])
testdf.columns = ['A', 'B']
testdf.head()
> A B
>0 test1 generates the final summary. \nRESULTS We evaluate the performance of
>1 test2 the cat and bat \n\n\nRESULTS\n teamed up to find some food
>2 test2 anthropology with RESULTS pharmacology and biology
listStrings = { '\nRESULTS\n'}
testdf.loc[testdf.B.apply(lambda x: len(listStrings.intersection(x.split())) >= 1)]
This returns nothing.
The result I am trying to produce is return the first two rows since they contain '\nRESULTS' , but NOT the last row since it doesn't have a '\nRESULTS'
So
> A B
>0 test1 generates the final summary. \nRESULTS We evaluate the performance of
>1 test2 the cat and bat \n\n\nRESULTS\n teamed up to find some food