0

How to test if a string contains one of the substrings in a list, in pandas?

In the link above, many people suggested the chosen answer, which is...

searchfor = ['og', 'at']
s[s.str.contains('|'.join(searchfor))]

In my case, I have 300+ searchfor in the list. and I also have several thousand rows in s. Using the way above, I can quickly check if any of the keyword is in the column. However, I am not sure how to create another column in s that shows what searchfor words are in the column.

For example, if

s.loc[0,'fulltext'] = 'ff og at ew'

, then

s.loc[0, 'found_keyword'] = ['og, 'at']

if

s.loc[1, 'fulltext'] = 'ff og ew gg'

, then

s.loc[1, 'found_keyword'] = ['og']

Any recommendation will be very appreciated.

SSS
  • 621
  • 2
  • 7
  • 25
  • one solution in mind is to use `applymap` and`lambda` or `.iterrows` and go through each row i guess... – sammy Feb 01 '21 at 02:07

1 Answers1

0

Try str.extractall:

df['found_keyword'] = (df['fulltext'].str.extractall(f'({pattern})')
                           .groupby(level=0)[0].agg(list)
                      )

Output:

      fulltext found_keyword
0  ff og at ew      [og, at]
1  ff og ew gg          [og]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Hi! Thank you for your reply! I just tried but I have some issues. My patterns are something like ['Arizona', 'Florida', 'Cleveland', ...] which are the state names. When I run the code, a cell of 'Florida USA' becomes [F, l, o, r, i, d, a, , U, S, A]. It seems it just breaks down the fulltext into each character. Am I using it in a wrong way. – SSS Feb 01 '21 at 21:11