0

I would like to join three dataframes of the following structure:

january_df=pd.DataFrame({
    'January':[4,4,3,2,1,1],
    'Product_no':['B1','B2','S1','S2','B3','T1'],
    'Label':['Ball','Bikini','Shoe','Shirt','Bag','Towel'],
    'ID':[1000, 1001, 1002, 1003, 1004, 1005],
})

february_df=pd.DataFrame({
    'February':[4,3,3,2,1,1],
    'Product_no':['S1','B2','B1','T1','S2','B3'],
    'Label':['Shoe','Bikini','Ball','Towel','Shirt','Bag'],
    'ID':[1002, 1001, 1000, 1005, 1003, 1004],
})

march_df=pd.DataFrame({
    'March':[5,1,1,1,1,1],
    'Product_no':['T1','E1','S1','B3','L1','B1'],
    'Label':['Towel','Earring','Shoe','Bag','Lotion','Ball'],
    'ID':[1005, 1006, 1002, 1004, 1007, 1000],
})

The desired output for March should be:

   January  February  March  Product_no Label      ID
---------------------------------------------------------
01   1          2        5    T1        Towel      1005
02   0          0        1    E1        Earring    1006
03   3          4        1    S1        Shoe       1002
04   1          1        1    B3        Bag        1004
05   0          0        1    L1        Lotion     1006
06   4          3        1    B1        Ball       1000

In a first step I tried to merge March and February

all_df = pd.merge(march_df, february_df, on="ID")

but it does not yield the result for the two months. I tried to understand the hints on Performant cartesian product (CROSS JOIN) with pandas and pandas three-way joining multiple dataframes on columns but did not get any wiser.

In R it can be achieved as a "piped multiple join"

threeMonths <- February%>%
      right_join(March)%>%
      left_join(January)

which I cannot seem to translate into Python.

How do I get the output as wanted?

Yahalnaut
  • 157
  • 1
  • 8
  • 2
    Pandas will `merge` by default on common columns between dataframes without `on` parameter defined: `february_df.merge(january_df, how='left').merge(march_df, how='right').fillna(0)` should work for you. – Scott Boston Mar 14 '21 at 00:42

1 Answers1

1

You can merge in two steps. For example for March:

tmp = pd.merge(january_df, february_df, on='ID')
final_df = pd.merge(tmp, march_df, on='ID', how='right')[['January', 'February', 'March', 'Product_no', 'Label', 'ID']].fillna(0)

print(final_df)

Prints:

   January  February  March Product_no    Label    ID
0      1.0       2.0      5         T1    Towel  1005
1      0.0       0.0      1         E1  Earring  1006
2      3.0       4.0      1         S1     Shoe  1002
3      1.0       1.0      1         B3      Bag  1004
4      0.0       0.0      1         L1   Lotion  1007
5      4.0       3.0      1         B1     Ball  1000
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91