0

I need to join two tables:

enter image description here

enter image description here

I use the next code:

    table = pd.merge(df1, df2, on=['user-reference_id'], how = 'right')

But it doesn´t work. I have obtained duplicated values.

The result i need to obtain is:

user-reference_id reference_date first_purchase month
159 2019-06-14 62.95 6
5009 2017-10-19 58.50 10
5026 2017-07-04 35.52 7
5032 2017-01-02 71.68 1
  • Does user-reference_id appear multiple times in one or both of your dataframes? If so, then you are getting back a cartisean product of records. – Scott Boston May 10 '21 at 14:42
  • I have user-reference_id in both dataframes and i need to join two tables by user-reference_id. In the last table i have reference_date as index and when i join two tables i get duplicates – new_datascientist May 10 '21 at 15:11

1 Answers1

1

Edit: I misunderstood your case so I have changed my answer.

The type of merge to be performed is wrong. You need to change your keyword argument on the how parameter. Change it to:

table = pd.merge(df1, df2, on=['user-reference_id'], how='outer')

There are a couple of links that are related e.g. :

Difference(s) between merge() and concat() in pandas

Pandas/Python: How to concatenate two dataframes without duplicates?

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#

  • 1
    @new_datascientist I have found a great article, which explains the how parameter. I think the documentation is not explaining it in an understandable way. See https://realpython.com/pandas-merge-join-and-concat/ – Confused Learner May 10 '21 at 18:27