1

I have several dataframes inn the following format:

time, 2019-01-25
07:00-07:30, 180.22
07:30-08:00, 119.12
08:00-08:30, 11.94
08:30-09:00, 41.62
09:00-09:30, 28.69
09:30-10:00, 119.77
...

(I have many files like the above loaded into dataframes array called frames).

And I am using Pandas to merge them with the code:

df_merged = reduce(lambda left, right: pd.merge(left, right, on=['time'],
                                                    how='outer'), frames).fillna('0.0').set_index('time')

(the code initially came from here)

The merge technically works, however, the final merged dataframe omits the time column. Does anyone know how to perform the merge as above while still keeping the time column in df_merged?

Brett
  • 11,637
  • 34
  • 127
  • 213
  • 1
    time, seem, turns into the index. add `.reset_index()` at the end – splash58 Nov 04 '19 at 19:09
  • @splash58 That did it! Put it in an answer and I'll accept. Scott Boston - your answer is also great for a performance addition. – Brett Nov 04 '19 at 19:17

1 Answers1

1

I would look at using join instead of merge in this situation.

Setup:

df1 = pd.DataFrame({'A':[*'ABCDE'], 'B':np.random.randint(0,10,5)})
df2 = pd.DataFrame({'A':[*'ABCDE'], 'C':np.random.randint(0,100,5)})
df3 = pd.DataFrame({'A':[*'ABCDE'], 'D':np.random.randint(0,1000,5)})
df4 = pd.DataFrame({'A':[*'ABCDE'], 'E':np.random.randint(0,10000,5)})


result1 = reduce(lambda l,r: pd.merge(l,r), [df1,df2,df3,df4])
result2 = df1.set_index('A').join([d.set_index('A') for d in [df2,df3,df4]]).reset_index()


all(result1 == result2)

True

Output(result1):

   A  B   C    D     E
0  A  7  19  980  8635
1  B  7  44  528   431
2  C  5   4  572  9405
3  D  7   7   96  2596
4  E  1   6  514   940

Timings:

%%timeit
result1 = reduce(lambda l,r: pd.merge(l,r), [df1,df2,df3,df4])

9.37 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
result2 = df1.set_index('A').join([d.set_index('A') for d in [df2,df3,df4]]).reset_index()

4.04 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Scott Boston
  • 147,308
  • 15
  • 139
  • 187