1

I would like to join two data frames based on multiple columns because there are duplicate IDs in the data sets.

I have tried a few ways, one of which is listed below. However, I cannot get it right. The option below gives me all rows from both data frames. I figure this should be easy but for some reason, it is not working. I checked the results. There are matches and instead of joining on the match, I just get both rows in the final data frame.

I am comparing two different data sets to ensure the same data exists in both sets.There can be more than one transaction with the same ID but I need to make sure that all that exists in one data frame, also exists in the other.

new_df = Enterprise.merge(Tableau, 
                           left_on=['ID','AID','Amount','Tax','CC'], 
                           right_on = ['ID','AID','Amount','Tax','CC'], 
                           how='left')
Amanda Morrow
  • 33
  • 1
  • 5
  • If there are duplicate ids, what is the expected behavior of merge? Also see [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101), which may answer your question. – cs95 Dec 18 '19 at 23:18
  • Well, I am comparing two different data sets to ensure the same data exists in both sets.There can be more than one transaction with the same ID but I need to make sure that all that exists in one data frame, also exists in the other. – Amanda Morrow Dec 18 '19 at 23:53
  • So wouldn't that be a simple comparison? `if (df1.equals(df2)): print('equal')`? – cs95 Dec 19 '19 at 01:32

0 Answers0