2

I need a smart way to join together multiple dataframes when various combinations of them are present in my session. It doesn't matter the order in which things are done via the joins.

In this simple example, I have 4 dataframes with every combination of which may or may not exist in my environment.

For example with each comma seperation referring to a dataframe representing df1, df2, df3, df4. My data frames will always have this similar sort of spelling if that helps.

My data frames will always join on the "ID" field in a data frame.

Example scenario setup (remember there are many permutations of this):

df1 = pd.DataFrame({'ID': ['jack', 'mary', 'jill'], 'Score1': [1,2,3]})
df3 = pd.DataFrame({'ID': ['jack', 'mary','katy'],  'Score3': [111,222,333]}
df4 = pd.DataFrame({'ID': ['jack', 'mary','katy'],  'Score4': [1111,2222,3333]}

The next time around I could just have df1 & df4...etc Further example of how data frames could be setup:

  • exists, exists, exists,exists
  • no, no, no, exists
  • no, no, exists,exists
  • no, exists, exists, exists
  • etc.... I believe it is 4! x 3! combinations? There will always be at least one data frame in which cause no join would be attempted

Solution looking for:

df_final = df1.merge(.....) for all combinations of df#s that may or may not exist

jack
  • 102
  • 1
  • 6

1 Answers1

1

IIUC:

l = [df1,df3,df4]

pd.concat([i.set_index('ID') for i in l], axis=1)

Output:

      Score1  Score3  Score4
jack     1.0   111.0  1111.0
jill     3.0     NaN     NaN
katy     NaN   333.0  3333.0
mary     2.0   222.0  2222.0

or as EFT suggests:

print(pd.concat([i.set_index('ID') for i in l], axis=1, join='inner'))

Output:

      Score1  Score3  Score4
ID                          
jack       1     111    1111
mary       2     222    2222
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • And to get an inner join, just add the argument `join='inner'` to `pd.concat` in this answer. – EFT Jun 19 '17 at 22:01