Merge pandas dataframe on matched substrings

Question

I am trying to search each row within a column which contains a longer string in one dataframe (Reason from df1) for a specific string from another dataframe (Invoice ID from df2) and then merge the two dataframes to get "Amount" in the merged dataframe.

Currently I do this in Excel with Index Match and *.
https://contexturesblog.com/archives/2012/10/09/find-text-with-index-and-match/

data1={'ID':['A','B','A','C'],'Reason':['xxxABC1','yyyBCD2','zzzXY3',"ppp123c"],'Amount':[10,20,30,10]}
df1=pd.DataFrame(data1)
df1

data2={'ID':['A','B','A','C'],'InvoiceID':['ABC1','BCD2','XY3',"123c"]}
df2=pd.DataFrame(data2)
df2

Expected Result

data3={'ID':['A','B','A','C'],'InvoiceID':['ABC1','BCD2','XY3',"123c"],'Amount':[10,20,30,10]}
df3=pd.DataFrame(data3)
df3

I used string extract with the following code and do get a filtered df1.

df4=(df1[df1['Reason'].str.contains('ABC1', regex=False)])

ID Reason Amount 0 A xxxABC1 10

I am not sure how to pull amount into df2 to create the new dataframe df3. I am thinking some loop/iteration over the rows is necessary (i.e. iterate through each row in df2, search for the specific text in each row of a certain column in df1 and add to df2 the amount if you find it. Then go through the next row in df2 and do this all over again).

I did not get fuzzy wuzzy to work for me but it could be because the reason column can be very lengthy when looking at actual data.

Why is this tagged `lines-of-code`? – Scott Hunter Sep 23 '22 at 17:34 — Scott Hunter, Sep 23 '22 at 17:34
You forgot to include your attempt at solving this problem. – Scott Hunter Sep 23 '22 at 17:34 — Scott Hunter, Sep 23 '22 at 17:34

Merge pandas dataframe on matched substrings

0 Answers0