4

I am trying to merge/join multiple Dataframes and so far I have no luck. I've found merge method, but it works only with two Dataframes. I also found this SO answer suggesting to do something like that:

df1.merge(df2,on='name').merge(df3,on='name')

Unfortunatelly it will not work in my case, because I have 20+ number of dataframes.

My next idea was to use join. According to the reference when joining multiple dataframes I need to use list and only I can join on index column. So I changed indexes for all of the columns (ok, it can be done grammatically easily) and end up with something like this:

df.join([df1,df2,df3])

Unfortunately, also this approach failed, because other columns names are this same in all dataframes. I've decided to do the last thing, that is renaming all columns. But when I finally joined everything: df = pd.Dataframe() df.join([df1,df2,df3])

I've received empty dataframe. I have no more idea, how I can join them. Can someone suggest anything more?

EDIT1:

Sample input:

import pandas as pd

df1 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr1', 'attr2'])

df1 
  name attr1 attr2
0    a     5    19
1    b    14    16
2    c     4     9

df2
  name attr1 attr2
0    a    15    49
1    b     4    36
2    c    14     9

Expected output:

df
  name attr1_1 attr2_1 attr1_2 attr2_2
0    a     5    19      15      49
1    b    14    16      4       36
2    c     4     9      14      9

Indexes might be unordered between dataframes, but it is guaranteed, that they will exists.

Community
  • 1
  • 1
sebap123
  • 2,541
  • 6
  • 45
  • 81
  • 3
    If the column names are the same in all the DataFrames, then what do you want the output to actually look like? – chrisaycock Jul 19 '16 at 19:46
  • @chrisaycock I've added sample input and expected output. I've based it on `merge`, where pandas automatically adds suffixes to names of other columns – sebap123 Jul 19 '16 at 19:54

3 Answers3

8

use pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')

merged

enter image description here

Or

merged.reset_index()

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

use reduce:

def my_merge(df1, df2):
    return df1.merge(df2,on='name')

final_df = reduce(my_merge, df_list)

considering df_list to be a list of your dataframes

Nik P
  • 89
  • 7
-1

The solution of @piRSquared works for 20+ dataframes, see the following script for creating 20+ example dataframes:

N = 25
dflist = []

for d in range(N):
    df = pd.DataFrame(np.random.rand(3,2))
    df.columns = ['attr1', 'attr2']

    df['name'] = ['a', 'b', 'c']

    dflist.append(df)
p-robot
  • 4,652
  • 2
  • 29
  • 38