1

I need to merge two dataframe like these: I have tried with Inner, left join but I have get duplicate values. My keys are date and categories

df1
    date         categories      cost   clicks  impression  conversion
    02-11-20    categories 5    153999   12         80        2
    03-11-20    categories 1    9366463  31        135        4
    03-11-20    categories 2    2738528  21        167        2
    03-11-20    Others          4177461  19         94        1
    03-11-20    categories 3    1747084   4         21        2
    04-11-20    categories 4    5812003  35        220        1
    04-11-20    categories 5    8490241  41        225        2    



df2
  date          categories       sales      deal
  02-11-20      categories 5     117810       1
  04-11-20      categories 4    1487500       3
  04-11-20      categories 6     299999       1
  04-11-20      Others           79106        1



desired output 
  date      categories      cost      clicks    impression  conversion  sales deal
  02-11-20  categories 5    153999      12          80          2      117810   1
  03-11-20  categories 1    9366463     31         135          4       na     na
  03-11-20  categories 2    2738528     21         167          2       na     na
  03-11-20  Others          4177461     19          94          1       na     na
  03-11-20  categories 3    1747084      4          21          2       na     na
  04-11-20  categories 4    5812003     35         220          1     1487500   3
  04-11-20  categories 5    8490241     41         225          2       na     na
  04-11-20  Others            na        na          na          na      79106   1
  04-11-20  categories 6      na        na          na          na     299999   1

thanks

sophocles
  • 13,593
  • 3
  • 14
  • 33

2 Answers2

2

You should use an outer join, and specify the 2 columns that the merged should be based on - note that you should provide the columns in a list.

The outer join, uses the keys from both frames, and NaN's are inserted for missing rows in both dataframes.

new = df1.merge(df2, on=['date','categories'], how='outer')

which prints:

        date    categories       cost  ...  conversion      sales  deal
0 2020-02-11  categories 5   153999.0  ...         2.0   117810.0   1.0
1 2020-03-11  categories 1  9366463.0  ...         4.0        NaN   NaN
2 2020-03-11  categories 2  2738528.0  ...         2.0        NaN   NaN
3 2020-03-11        Others  4177461.0  ...         1.0        NaN   NaN
4 2020-03-11  categories 3  1747084.0  ...         2.0        NaN   NaN
5 2020-04-11  categories 4  5812003.0  ...         1.0  1487500.0   3.0
6 2020-04-11  categories 5  8490241.0  ...         2.0        NaN   NaN
7 2020-04-11  categories 6        NaN  ...         NaN   299999.0   1.0
8 2020-04-11        Others        NaN  ...         NaN    79106.0   1.0
sophocles
  • 13,593
  • 3
  • 14
  • 33
0

Use pd.merge by passing your keys and use how="outer":

import pandas as pd
pd.merge(left, right, on=["key1","key2"], how="outer")

have a look at this example.

sophocles
  • 13,593
  • 3
  • 14
  • 33