0

I have a dataframe such as :

the_list =['LjHH','Lhy_kd','Ljk']

COL1 COL2 
A    ADJJDUD878_Lhy_kd
B    Y0_0099JJ_Ljk
C    YTUUDBBDHHD
D    POL0990E_LjHH'

And I would like to add a new COL3 column where if within COL2 I have a match with a value in the_list, I add in that column the matching element of the_list.

Expected result;

COL1 COL2               COL3
A    ADJJDUD878_Lhy_kd  Lhy_kd
B    Y0_0099JJ_2_Ljk    Ljk    
C    YTUUDBBDHHD        NA
D    POL0990E_LjHH'     LjHH
chippycentra
  • 3,396
  • 1
  • 6
  • 24
  • 1
    Does this answer your question? [Filter pandas DataFrame by substring criteria](https://stackoverflow.com/questions/11350770/filter-pandas-dataframe-by-substring-criteria) – Vishnudev Krishnadas Dec 27 '21 at 10:32
  • 1
    I think [```this```](https://stackoverflow.com/questions/26577516/how-to-test-if-a-string-contains-one-of-the-substrings-in-a-list-in-pandas) link can be helpful here. – sophocles Dec 27 '21 at 10:34

1 Answers1

1

For get only first matched values use Series.str.extract with joined values of lists by | for regex or:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3'] = df['COL2'].str.extract(f'({"|".join(the_list)})', expand=False)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH

For get all matched values (if possible multiple values) use Series.str.findall with Series.str.join and last repalce empty string to NaNs:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3']=df['COL2'].str.findall(f'{"|".join(the_list)}').str.join(',').replace('',np.nan)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252