Fastest way to merge and append multiple CSVs / data frames using pandas

Question

I have the following data frames / CSVs:

df1=
Person apple ball
A      3     4   
B      5     1

df2=
Person apple cat
A      3     6  
B      5     2  

df3=
Person apple cat
C      6     2   
D      2     2

df4=
Person dog  cat 
C      1    2    
D      1    2

I am interested in knowing the fastest way to merge and append these kinds of data frames without any duplicates based on 'Person'. The expected output looks like:

output=
Person apple ball  cat  dog
A      3     4     6    nan
B      5     1     2    nan
C      6     nan   2    1
D      2     nan   2    1

Check [this link](https://stackoverflow.com/questions/38256104) — Mr. Hobo, Dec 11 '20 at 07:20

jezrael · Accepted Answer · 2020-12-11T07:38:18.363

If values of same column and index are same in all DataFrames is possible use:

It means e.g. for index=A, column=apple is for each Dataframe same value - here 3 (if exist)

dfs = [df1, df2, df3, df4]
#if Person is column, not index
dfs = [x.set_index('Person') for x in dfs]

df = pd.concat(dfs).groupby(level=0).first()
print (df)
       apple  ball  cat  dog
Person                       
A        3.0   4.0  6.0  NaN
B        5.0   1.0  2.0  NaN
C        6.0   NaN  2.0  1.0
D        2.0   NaN  2.0  1.0

sammywemmy · Answer 2 · 2020-12-11T07:30:55.247

See if this fits your use case, where the merge is both on columns and index labels, and finally concatenating the results:

first = df1.merge(df2, on=["index", "apple"])
second = df3.merge(df4, on=["index", "cat"])
pd.concat([first, second])

       apple    ball  cat  dog
index               
   A    3       4      6    3
   B    5       1      2    8
   C    6       8      2    1
   D    2       3      2    1

Fastest way to merge and append multiple CSVs / data frames using pandas

2 Answers2