0

I have 2 dataframes that with different column names and indexed by datetime:

df1

            A  B
time            
2011-03-01  1  6
2011-03-02  4  8
2011-03-08  5  2
2011-03-09  6  3

df2

            C  D
time            
2011-03-01  8  7
2011-03-02  9  6
2011-03-07  4  4
2011-03-08  1  2

I want to merge them to obtain something like this:

            A  B  C  D
time                          
2011-03-01  1  6  8  7
2011-03-02  4  8  9  3
2011-03-07  NaN  NaN  4  4
2011-03-08  5  2  1  2
2011-03-09  6  3  NaN  NaN

instead using the concat command:

df = pd.concat([df1, df2], axis=0).sort_index()

I get the following merged dataframe:

              A    B    C    D
time                          
2011-03-01  1.0  6.0  NaN  NaN
2011-03-01  NaN  NaN  8.0  7.0
2011-03-02  4.0  8.0  NaN  NaN
2011-03-02  NaN  NaN  9.0  6.0
2011-03-07  NaN  NaN  4.0  4.0
2011-03-08  5.0  2.0  NaN  NaN
2011-03-08  NaN  NaN  1.0  2.0
2011-03-09  6.0  3.0  NaN  NaN

with undesired duplicated indexes!

How can I merge the 2 DFs properly?

cflayer
  • 104
  • 1
  • 9
  • Take a look at [merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) – Emi OB Feb 18 '22 at 13:20
  • You code is fine, but use `axis=1`. See [here](https://stackoverflow.com/a/71174264/16343464). – mozway Feb 18 '22 at 13:28

3 Answers3

1

You could use pd.merge

pd.merge(
    df1, # Your first df 
    df2, # Second df
    how="outer", 
    left_index=True, # merging on index (your datetime)
    right_index=True, # mergin on index (your datetime)
)
Akmal Soliev
  • 572
  • 4
  • 17
1

pandas.concat is the way to go, using axis=1.

If you still have issues with axis=1, then this means your indexes don't align (maybe a different type) and you will have the same issue with join or merge.

df1 = pd.DataFrame({'A': [1,4,5,6], 'B': [6,8,2,3]},
                   index=['2011-03-01', '2011-03-02', '2011-03-08', '2011-03-09'])

df2 = pd.DataFrame({'C': [8,9,1,4], 'D': [7,6,2,4]},
                   index=['2011-03-01', '2011-03-02', '2011-03-07', '2011-03-08'])

pd.concat([df1, df2], axis=1).sort_index()

output:

              A    B    C    D
2011-03-01  1.0  6.0  8.0  7.0
2011-03-02  4.0  8.0  9.0  6.0
2011-03-07  NaN  NaN  1.0  2.0
2011-03-08  5.0  2.0  4.0  4.0
2011-03-09  6.0  3.0  NaN  NaN
mozway
  • 194,879
  • 13
  • 39
  • 75
0

I do not access your dataframe to check my code, but here what I have come up with:

import numpy as np
df2.reset_index().merge(df1.reset_index(), on="time").fillna(np.NAN)

Explanation

You need to call reset_index() to change the index to a column. If you are not interested to do so, you can use left_index and right_index parameters in the merge function.

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29