How to merge 2 Python dataframes with different columns but possible same datetime index avoiding index duplicates?

Question

I have 2 dataframes that with different column names and indexed by datetime:

df1

            A  B
time            
2011-03-01  1  6
2011-03-02  4  8
2011-03-08  5  2
2011-03-09  6  3

df2

            C  D
time            
2011-03-01  8  7
2011-03-02  9  6
2011-03-07  4  4
2011-03-08  1  2

I want to merge them to obtain something like this:

            A  B  C  D
time                          
2011-03-01  1  6  8  7
2011-03-02  4  8  9  3
2011-03-07  NaN  NaN  4  4
2011-03-08  5  2  1  2
2011-03-09  6  3  NaN  NaN

instead using the concat command:

df = pd.concat([df1, df2], axis=0).sort_index()

I get the following merged dataframe:

              A    B    C    D
time                          
2011-03-01  1.0  6.0  NaN  NaN
2011-03-01  NaN  NaN  8.0  7.0
2011-03-02  4.0  8.0  NaN  NaN
2011-03-02  NaN  NaN  9.0  6.0
2011-03-07  NaN  NaN  4.0  4.0
2011-03-08  5.0  2.0  NaN  NaN
2011-03-08  NaN  NaN  1.0  2.0
2011-03-09  6.0  3.0  NaN  NaN

with undesired duplicated indexes!

How can I merge the 2 DFs properly?

Take a look at [merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) — Emi OB, Feb 18 '22 at 13:20
You code is fine, but use `axis=1`. See [here](https://stackoverflow.com/a/71174264/16343464). — mozway, Feb 18 '22 at 13:28

score 1 · Answer 1 · answered Feb 18 '22 at 13:23

1

You could use pd.merge

pd.merge(
    df1, # Your first df 
    df2, # Second df
    how="outer", 
    left_index=True, # merging on index (your datetime)
    right_index=True, # mergin on index (your datetime)
)

answered Feb 18 '22 at 13:23

Akmal Soliev

572
4
17

score 1 · Accepted Answer · answered Feb 18 '22 at 13:27

pandas.concat is the way to go, using axis=1.

If you still have issues with axis=1, then this means your indexes don't align (maybe a different type) and you will have the same issue with join or merge.

df1 = pd.DataFrame({'A': [1,4,5,6], 'B': [6,8,2,3]},
                   index=['2011-03-01', '2011-03-02', '2011-03-08', '2011-03-09'])

df2 = pd.DataFrame({'C': [8,9,1,4], 'D': [7,6,2,4]},
                   index=['2011-03-01', '2011-03-02', '2011-03-07', '2011-03-08'])

pd.concat([df1, df2], axis=1).sort_index()

output:

              A    B    C    D
2011-03-01  1.0  6.0  8.0  7.0
2011-03-02  4.0  8.0  9.0  6.0
2011-03-07  NaN  NaN  1.0  2.0
2011-03-08  5.0  2.0  4.0  4.0
2011-03-09  6.0  3.0  NaN  NaN

score 0 · Answer 3 · answered Feb 18 '22 at 13:21

I do not access your dataframe to check my code, but here what I have come up with:

import numpy as np
df2.reset_index().merge(df1.reset_index(), on="time").fillna(np.NAN)

Explanation

You need to call reset_index() to change the index to a column. If you are not interested to do so, you can use left_index and right_index parameters in the merge function.

How to merge 2 Python dataframes with different columns but possible same datetime index avoiding index duplicates?

3 Answers3

Explanation