My question is very similar to How to test if a string contains one of the substrings in a list, in pandas? except that the list of substrings to check varies by observation and is stored in a list column. Is there a way to access that list in a vectorized way by referring to the series?
Example dataset
import pandas as pd
df = pd.DataFrame([{'a': 'Bob Smith is great.', 'b': ['Smith', 'foo'])},
{'a': 'The Sun is a mass of incandescent gas.', 'b': ['Jones', 'bar']}])
print(df)
I'd like to generate a third column, 'c', that equals 1 if any of the 'b' strings is a substring of 'a' for its respective row, and zero otherwise. That is, I'd expect in this case:
a b c
0 Bob Smith is great. [Smith, foo] 1
1 The Sun is a mass of incandescent gas. [Jones, bar] 0
My attempt:
df['c'] = df.a.str.contains('|'.join(df.b)) # Does not work.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_4092606/761645043.py in <module>
----> 1 df['c'] = df.a.str.contains('|'.join(df.b)) # Does not work.
TypeError: sequence item 0: expected str instance, list found