0

Trying to Join 2 dataframes based on if the values in Table A are in the values on Table B

df1={Table_A: ABC, BCD-1, BCD-2}
df2={Table_B: ABC-1, BCD-1, BCD-2}

The straight join below returns the following

df3=pd.merge(df1,df2,left_on='Table_A',right_on='Table_B',how='outer')

Post Join Current Output

I'm trying to do something were it joins if df1.Table_A is in df2.Table_B

Post Join Desired Output

This was what I was thinking bit obviously isn't working for me.

df3=pd.merge(df1,df2,on=df1['Table_A'].isin(df2['Table_B']),how='outer')

1 Answers1

0

This is fuzzy merging and I wrote a function for this: fuzzy_merge:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

fuzzy_merge(df1,df2,'Table_A','Table_B')

  Table_A matches
0     ABC   ABC-1
1   BCD-1   BCD-1
2   BCD-2   BCD-2
Erfan
  • 40,971
  • 8
  • 66
  • 78