1

I have three dataframes

Dataframe df1:

date    A
0   2022-04-11  1
1   2022-04-12  2
2   2022-04-14  26
3   2022-04-16  2
4   2022-04-17  1
5   2022-04-20  17
6   2022-04-21  14
7   2022-04-22  1
8   2022-04-23  9
9   2022-04-24  1
10  2022-04-25  5
11  2022-04-26  2
12  2022-04-27  21
13  2022-04-28  9
14  2022-04-29  17
15  2022-04-30  5
16  2022-05-01  8
17  2022-05-07  1241217
18  2022-05-08  211
19  2022-05-09  1002521
20  2022-05-10  488739
21  2022-05-11  12925
22  2022-05-12  57
23  2022-05-13  8515098
24  2022-05-14  1134576

Dateframe df2:

date    B
0   2022-04-12  8
1   2022-04-14  7
2   2022-04-16  2
3   2022-04-19  2
4   2022-04-23  2
5   2022-05-07  2
6   2022-05-08  5
7   2022-05-09  2
8   2022-05-14  1

Dataframe df3:

date    C
0   2022-04-12  6
1   2022-04-13  1
2   2022-04-14  2
3   2022-04-20  3
4   2022-04-21  9
5   2022-04-22  25
6   2022-04-23  56
7   2022-04-24  49
8   2022-04-25  68
9   2022-04-26  71
10  2022-04-27  40
11  2022-04-28  44
12  2022-04-29  27
13  2022-04-30  34
14  2022-05-01  28
15  2022-05-07  9
16  2022-05-08  20
17  2022-05-09  24
18  2022-05-10  21
19  2022-05-11  8
20  2022-05-12  8
21  2022-05-13  14
22  2022-05-14  25
23  2022-05-15  43
24  2022-05-16  36
25  2022-05-17  29
26  2022-05-18  28
27  2022-05-19  17
28  2022-05-20  6

I would like to merge df1, df2, df3 in a single dataframe with columns date, A, B, C, in such a way that date contains all dates which appeared in df1 and/or df2 and/or df3 (without repetition), and if a particular date was not in any of the dataframes, then for the respective column I put value 0.0. So, I would like to have something like that:

date    A   B   C
0   2022-04-11  1.0 0.0 0.0
1   2022-08-12  2.0 8.0 6.0
2   2022-08-13  0.0 0.0 1.0
...

I tried to use this method

merge1 = pd.merge(df1, df2, how='outer')
sorted_merge1 = merge1.sort_values(by=['date'], ascending=False)
full_merge = pd.merge(sorted_merg1, df3, how='outer')

However, it seems it skips the dates which are not common for all three dataframes.

Ziva
  • 3,181
  • 15
  • 48
  • 80
  • merge `on` is missing? – Mohamed Thasin ah Aug 05 '22 at 09:52
  • 1
    Does this answer your question? [Merge multiple DataFrames Pandas](https://stackoverflow.com/questions/55652704/merge-multiple-dataframes-pandas) – ouroboros1 Aug 05 '22 at 09:57
  • please try adding on=['date'] in the parameters of both merges. – Triki Sadok Aug 05 '22 at 09:58
  • Refactoring the accepted answer from the suggested duplicate ([Merge multiple DataFrames Pandas](https://stackoverflow.com/questions/55652704/merge-multiple-dataframes-pandas)): `dfs = [df.set_index(['date']) for df in [df1, df2, df3]]` and then `pd.concat(dfs,axis=1).reset_index().fillna(0)`. You could of course also create a `df.merge()` chain (as [here](https://stackoverflow.com/a/73247928/18470692), but that won't scale. – ouroboros1 Aug 05 '22 at 09:58

1 Answers1

1

Try this,

print(pd.merge(df1, df2, on='date', how='outer').merge(df3, on='date', how='outer').fillna(0))

O/P:

          date          A    B     C
0   2022-04-11        1.0  0.0   0.0
1   2022-04-12        2.0  8.0   6.0
2   2022-04-14       26.0  7.0   2.0
3   2022-04-16        2.0  2.0   0.0
4   2022-04-17        1.0  0.0   0.0
5   2022-04-20       17.0  0.0   3.0
6   2022-04-21       14.0  0.0   9.0
7   2022-04-22        1.0  0.0  25.0
8   2022-04-23        9.0  2.0  56.0
9   2022-04-24        1.0  0.0  49.0
10  2022-04-25        5.0  0.0  68.0
11  2022-04-26        2.0  0.0  71.0
12  2022-04-27       21.0  0.0  40.0
13  2022-04-28        9.0  0.0  44.0
14  2022-04-29       17.0  0.0  27.0
15  2022-04-30        5.0  0.0  34.0
16  2022-05-01        8.0  0.0  28.0
17  2022-05-07  1241217.0  2.0   9.0
18  2022-05-08      211.0  5.0  20.0
19  2022-05-09  1002521.0  2.0  24.0
20  2022-05-10   488739.0  0.0  21.0
21  2022-05-11    12925.0  0.0   8.0
22  2022-05-12       57.0  0.0   8.0
23  2022-05-13  8515098.0  0.0  14.0
24  2022-05-14  1134576.0  1.0  25.0
25  2022-04-19        0.0  2.0   0.0
26  2022-04-13        0.0  0.0   1.0
27  2022-05-15        0.0  0.0  43.0
28  2022-05-16        0.0  0.0  36.0
29  2022-05-17        0.0  0.0  29.0
30  2022-05-18        0.0  0.0  28.0
31  2022-05-19        0.0  0.0  17.0
32  2022-05-20        0.0  0.0   6.0
​
  • perform merge chain and fill NaN with 0
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111