combine dataframes with overlapping indexes and different columns

Question

Suppose I'm given a large sets of csv files and I would like to read them into python and combine them in a certain way. Some key characteristics about the files

Index is like a time dimension
Column are specific entities
Index of different files may be overlapping, but the value should be the same
not all files have the same amount of columns

At the end I would like to have a huge data frame, collecting all entities over the whole time window.

Let me give an example

import pandas as pd

df_1 = pd.DataFrame(data=[[np.nan, 2],[np.nan, 4],[np.nan, 8]],index = [0,1,2],columns=["A","B"])
df_2 = pd.DataFrame(data=[[4, 8, 2],[4, 2, 8],[6, 1, 1]],index = [2,3,4],columns=["A","B","C"])
df_3 = pd.DataFrame(data=[[2, np.nan, 4],[8, np.nan, 8],[1, 8, 10]],index = [2,3,4],columns=["C","D","E"])
df_end = pd.DataFrame(data=[[np.nan, 2, np.nan, np.nan, np.nan],[np.nan, 4, np.nan, np.nan, np.nan],[4, 8, 2, np.nan, 4], [4, 2, 8, np.nan, 8], [6, 1, 1, 8, 10]],index = [0, 1, 2, 3, 4],columns=["A","B","C","D","E"])

So as you see, we have three data frames and I envision to have the end result as outlined in df_end. In reality, the data frames are much larger but the toy example should make clear what I try to achieve.

I've tried to do something like this:

temp = pd.concat([df_1,df_2,df_3])
temp[~temp.index.duplicated(keep="first")]

which didn't show the desired result.

score 0 · Answer 1 · answered Mar 21 '23 at 10:45

Use combine_first in order of preference:

out = df_1.combine_first(df_2).combine_first(df_3)

Generalization with functools.reduce:

from functools import reduce

dfs = [df_1, df_2, df_3]
out = reduce(lambda a, b: a.combine_first(b), dfs)

Output:

     A  B    C    D     E
0  NaN  2  NaN  NaN   NaN
1  NaN  4  NaN  NaN   NaN
2  4.0  8  2.0  NaN   4.0
3  4.0  2  8.0  NaN   8.0
4  6.0  1  1.0  8.0  10.0

Alternative with concat:

out = pd.concat([df_1, df_2, df_3]).groupby(level=0).first()

combine dataframes with overlapping indexes and different columns

1 Answers1