1

i have edited this post with the specific case:

i have a list of dataframes like this (note that df1 and df2 have a row in common)

df1

index Date A
0 2010-06-19 4
1 2010-06-20 3
2 2010-06-21 2
3 2010-06-22 1
4 2012-07-19 5

df2

index Date B
0 2012-07-19 5
1 2012-07-20 6

df3

index Date C
0 2020-06-19 5
1 2020-06-20 2
2 2020-06-21 9

df_list = [df1, df2, df3]

I would like to merge all dataframes in a single dataframe, without losing rows and placing nan where there are no things to merge. The criteria would be merging them by the column 'Date' (the column should have all the dates of all the merged dataframes, ordered by date).

The resulting dataframe should look like this:

Resulting Dataframe:

index Date A B C
0 2010-06-19 4 nan nan
1 2010-06-20 3 nan nan
2 2010-06-21 2 nan nan
3 2010-06-22 1 nan nan
4 2012-07-19 5 5 nan
5 2012-07-20 nan 6 nan
6 2020-06-19 nan nan 5
7 2020-06-20 nan nan 2
8 2020-06-21 nan nan 9

I tried something like this:

from functools import reduce

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['Date'], how='outer'), df_list)

BUT the resulting dataframe is not as expected (i miss some columns and is not ordered by date). I think i am missing something.

Thank you very much

  • 2
    Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – It_is_Chris Dec 09 '21 at 17:45
  • outer join is the way to go, but when one row is in common in 2 dataframe, it continue to complete the DF with nans even if the list have data inside – Legolas DYD Leon Kennedy Dec 09 '21 at 18:30
  • i have news. it seems that for shorter list, the command i wrote is working. For very long list of dataframe, it seem that the latest list are all filled with 0. is there some option i am missing ? – Legolas DYD Leon Kennedy Dec 09 '21 at 18:44
  • SOLVED: for overlapping datas, i had to add the option: Sort = TRUE in the lambda function. Seemed i was missing the order for big dataframes and i was only seeng the nan at end and start of frames. Thank you all ;-) – Legolas DYD Leon Kennedy Dec 09 '21 at 19:11

2 Answers2

1

For overlapping data, I had to add the option: Sort = TRUE in the lambda function. Seemed I was missing the order for big dataframes and I was only seeing the nan at the end and start of frames. Thank you all ;-)

from functools import reduce

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Date'],
        how='outer', sort=True), df_list)
anestv
  • 543
  • 6
  • 28
0

Use pandas.concat(). It takes a list of dataframes, and appends common columns together, filling new columns with NaN as necessary:

new_df = pd.concat([df1, df2, df3])

Output:

>>> new_df
   index        Date    A    B    C
0      0  2010-06-19  4.0  NaN  NaN
1      1  2010-06-20  3.0  NaN  NaN
2      2  2010-06-21  2.0  NaN  NaN
3      3  2010-06-22  1.0  NaN  NaN
0      0  2012-07-19  NaN  5.0  NaN
1      1  2012-07-20  NaN  6.0  NaN
0      0  2020-06-19  NaN  NaN  5.0
1      1  2020-06-20  NaN  NaN  2.0
2      2  2020-06-21  NaN  NaN  9.0