0

I have a df1, example:

     B    A    C
B         1
A              1
C    2

,and a df2, example:

    C    E    D
C        2    3
E             1
D   2

The column and row 'C' is common in both dataframes.

I would like to combine these dataframes such that I get,

    B    A    C    D    E
B        1
A             1
C   2              2    3
D                       1
E   2  

Is there an easy way to do this? pd.concat and pd.append do not seem to work. Thanks!

Edit: df1.combine_first(df2) works (thanks @jezarel), but can we keep the original ordering?

DPdl
  • 723
  • 7
  • 23
  • 2
    I think need `df1.combine_first(df2)` – jezrael Mar 26 '18 at 14:22
  • What means original ordering? Can you change sample data? – jezrael Mar 26 '18 at 14:31
  • @jezrael, the combine_first reorders the columns (and rows) as A, B, C, D, E instead of the original ordering B, A, C, D, E. (I have changed the example above to illustrate this). Thanks! – DPdl Mar 26 '18 at 14:35

1 Answers1

2

There is problem combine_first always sorted columns namd index, so need reindex with combine columns names:

idx = df1.columns.append(df2.columns).unique()
print (idx)
Index(['B', 'A', 'C', 'E', 'D'], dtype='object')

df = df1.combine_first(df2).reindex(index=idx, columns=idx)
print (df)
     B    A    C    E    D
B  NaN  1.0  NaN  NaN  NaN
A  NaN  NaN  1.0  NaN  NaN
C  2.0  NaN  NaN  2.0  3.0
E  NaN  NaN  NaN  NaN  1.0
D  NaN  NaN  2.0  NaN  NaN

More general solution:

c = df1.columns.append(df2.columns).unique()
i = df1.index.append(df2.index).unique()

df = df1.combine_first(df2).reindex(index=i, columns=c)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252