2

I have a list of strings. I need to iterate through rows of my dataframe to try to find if any or more of list items are included in value of one column(string). I'm trying to find substring match between any list item and dataframe column value. Then, I need to assign matched value(s) to a new column or pass NaN if there's no match. Not just any, but all matched parts of string. So, in the third row of my df, these would be both 'E' and 'F22'.

df = pd.DataFrame({'type':['A23 E I28','I28 F A23', 'D41 E F22']})
matches = ['E', 'F22']
stjepan
  • 35
  • 1
  • 7
  • 1
    have a look: [How to test if a string contains one of the substrings in a list, in pandas?](https://stackoverflow.com/a/26577689/10140310) – help-ukraine-now Aug 13 '19 at 18:59
  • Possible duplicate of [Select rows from a DataFrame based on values in a column in pandas](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – Andrew Drake Aug 13 '19 at 19:03

2 Answers2

2

Is this what you're looking for?

If there's a match, the keyword is assigned to a new colum

df['new_col'] = df['type'].str.extract(f"({'|'.join(matches)})")
    type        new_col
0   A23 E I28   E
1   I28 F A23   NaN
2   D41 E F22   E

Edit:

df['new_col'] = (df['type']
                 .str.findall(f"({'|'.join(matches)})")
                 .str.join(', ')
                 .replace('', np.nan))
    type    new_col
0   A23 E I28   E
1   I28 F A23   NaN
2   D41 E F22   E, F22

help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36
  • Thanks! How could I pass all matched substrings, not just any? (both 'E' and 'F22' for third row) – stjepan Aug 13 '19 at 19:13
  • I just want to pass the matched parts of row value (substrings). In third row of the df these would be 'E' and 'F22', because both of them are in my matches list. – stjepan Aug 13 '19 at 19:17
0

I would do it this way:

df["match"] = df.type.map(lambda s: "".join(set(s).intersection(matches)))  
df.loc[~df.type.str.contains("|".join(matches)), "match"] = np.nan
ivallesp
  • 2,018
  • 1
  • 14
  • 21
  • Thanks, I got a similar result. What I need is to pass matched substrings to a new 'match' column, but I'm unable to do this. – stjepan Aug 13 '19 at 19:08
  • Close, but please read my explanation at the end of the post(description) once again. :) – stjepan Aug 13 '19 at 19:22