2

I am looking to merge 3 dataframes who all have a common 'ORG_ID' column. df1 is my original dataframe and has an org_id count of 154. I would like to merge df1 with df2 and df3 in order to get the ratings for each org id in df1 and am having trouble figuring out how to do this.

I managed to merge df1 and df2 but would get a new count of 84 org_ids instead of 154 because 70 org_ids are not found in df2 but can be found in df3. I cant figure out how to merge all 3 dataframes in order to get a rating for all 154 org_ids in df1. Any ideas? Please keep in mind the dataframes are all different sizes but I am looking to have ratings for ALL 154 original Org_ids found in df1.

Here was my code to merge df1 and df2 :
df4=pd.merge(df1, df2, left_on = 'ORG_ID', right_on = 'ORG_ID')

Here is a sample of what the dataframes look like:

df1: 
Org_id   Name   
123      House 1     
457      House 3     
876      House 16      
567      House 56     

df2:
Org_id   Name       Rating
123      House 1    10   
457      House 3    5    
857      House 87   6    

df3:
Org_id   Name       Rating1   org_type        location     area
123      House 1    10        single family   Hamilton     Suburban 
567      House 56   9         single family   Middletown   Suburban
spacedinosaur10
  • 695
  • 3
  • 10
  • 24
  • You are looking for a left join. `pd.merge(df1, df2, on= 'ORG_ID', how = `'left') – A.Kot Feb 01 '17 at 16:44
  • @A.Kot I tried the following: df4=pd.merge(df1, df2, df3, on= 'ORG_ID', how = 'left'). and i got the error : TypeError: merge() got multiple values for keyword argument 'how' – spacedinosaur10 Feb 01 '17 at 16:56
  • 1
    You can't put three dataframes into the merge. This is basic database syntax. It should be `df4 = df1.merge(df2, on='ORG_ID', how='left).merge(df3, on='ORG_ID', how='left)` – A.Kot Feb 01 '17 at 16:59
  • @A.Kot thanks for the comments. So are you saying here it is not possible to merge three dataframes? the code you wrote above does not work. the issue is with the second .merge – spacedinosaur10 Feb 01 '17 at 19:13
  • If you look at the documentation for pandas.merge you will see that only one dataframe input is allowed i.e. you cannot have multiple dataframes within one merge statement. At the end of the day, you can merge as many dataframes as you want. See my solution below. – A.Kot Feb 01 '17 at 20:07

1 Answers1

2
df4 = df1.merge(df2, on='ORG_ID', how='left')
df5 = df4.merge(df3, on='ORG_ID', how='left')

df5
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • Thank you A.Kot - really appreciate your patience ! – spacedinosaur10 Feb 01 '17 at 21:45
  • Consider a chain merge with `reduce` on list of dfs: `from functools import reduce; finaldf = reduce(lambda left,right: pd.merge(left, right, on='ORG_ID', how='left'), [df1,df2,df3])` – Parfait Feb 02 '17 at 01:39
  • @Parfait Was trying to do that initially in the comments but even a chain without using a lambda function wasn't working for him. – A.Kot Feb 02 '17 at 14:40
  • In your comments, *left* is missing a second quote. – Parfait Feb 02 '17 at 14:59