3

I have the following data frames / CSVs:

df1=
Person apple ball
A      3     4   
B      5     1

df2=
Person apple cat
A      3     6  
B      5     2  

df3=
Person apple cat
C      6     2   
D      2     2

df4=
Person dog  cat 
C      1    2    
D      1    2   

I am interested in knowing the fastest way to merge and append these kinds of data frames without any duplicates based on 'Person'. The expected output looks like:

output=
Person apple ball  cat  dog
A      3     4     6    nan
B      5     1     2    nan
C      6     nan   2    1
D      2     nan   2    1
mArk
  • 630
  • 8
  • 13

2 Answers2

1

If values of same column and index are same in all DataFrames is possible use:

It means e.g. for index=A, column=apple is for each Dataframe same value - here 3 (if exist)

dfs = [df1, df2, df3, df4]
#if Person is column, not index
dfs = [x.set_index('Person') for x in dfs]

df = pd.concat(dfs).groupby(level=0).first()
print (df)
       apple  ball  cat  dog
Person                       
A        3.0   4.0  6.0  NaN
B        5.0   1.0  2.0  NaN
C        6.0   NaN  2.0  1.0
D        2.0   NaN  2.0  1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

See if this fits your use case, where the merge is both on columns and index labels, and finally concatenating the results:

first = df1.merge(df2, on=["index", "apple"])
second = df3.merge(df4, on=["index", "cat"])
pd.concat([first, second])

       apple    ball  cat  dog
index               
   A    3       4      6    3
   B    5       1      2    8
   C    6       8      2    1
   D    2       3      2    1
sammywemmy
  • 27,093
  • 4
  • 17
  • 31