Add a new column with matching values in a list in pandas

Question

I have a dataframe such as :

the_list =['LjHH','Lhy_kd','Ljk']

COL1 COL2 
A    ADJJDUD878_Lhy_kd
B    Y0_0099JJ_Ljk
C    YTUUDBBDHHD
D    POL0990E_LjHH'

And I would like to add a new COL3 column where if within COL2 I have a match with a value in the_list, I add in that column the matching element of the_list.

Expected result;

COL1 COL2               COL3
A    ADJJDUD878_Lhy_kd  Lhy_kd
B    Y0_0099JJ_2_Ljk    Ljk    
C    YTUUDBBDHHD        NA
D    POL0990E_LjHH'     LjHH

Does this answer your question? [Filter pandas DataFrame by substring criteria](https://stackoverflow.com/questions/11350770/filter-pandas-dataframe-by-substring-criteria) — Vishnudev Krishnadas, Dec 27 '21 at 10:32
I think [```this```](https://stackoverflow.com/questions/26577516/how-to-test-if-a-string-contains-one-of-the-substrings-in-a-list-in-pandas) link can be helpful here. — sophocles, Dec 27 '21 at 10:34

jezrael · Answer 1 · 2021-12-27T10:36:57.770

For get only first matched values use Series.str.extract with joined values of lists by | for regex or:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3'] = df['COL2'].str.extract(f'({"|".join(the_list)})', expand=False)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH

For get all matched values (if possible multiple values) use Series.str.findall with Series.str.join and last repalce empty string to NaNs:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3']=df['COL2'].str.findall(f'{"|".join(the_list)}').str.join(',').replace('',np.nan)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH

Add a new column with matching values in a list in pandas

1 Answers1