I am trying to perform string matching between two pandas dataframe.
df_1:
ID Text Text_ID
1 Apple 53
2 Banana 84
3 Torent File 77
df_2:
ID File_name
22 apple_mask.txt
23 melon_banana.txt
24 Torrent.pdf
25 Abc.ppt
Objective: I want to populate the Text_ID
against File_name
in df_2 if the string in df_1['Text'] matches with df_2['File_name']. If no matches found then populate the df_2[
Text_ID] as -1. So the resultant
df` looks like
ID Flie_name Text_ID
22 apple_mask.txt 53
23 melon_banana.txt 84
24 Torrent.pdf 77
25 Abc.ppt -1
I have tried this SO thread, but it is giving a column where File_name wise fuzz score is listed.
I am trying out a non fuzzy way. Please see below the code snippets:
text_ls = df_1['Text'].tolist()
file_ls = df_2['File_name'].tolist()
text_id = []
for i,j in zip(text_ls,file_ls):
if str(j) in str(i):
t_i = df_1.loc[df_1['Text']==i,'Text_ID']
text_id.append(t_i)
else:
t_i = -1
text_id.append(t_i)
df_2['Text_ID'] = text_id
But I am getting a blank text_id
list.
Can anybody provide some clue on this? I am OK to use fuzzywuzzy as well.