0

I have two dataframes and I want compare their columns. Based on this comparison I want to make a new dataframe. I have tried pd.merge, join but that is not working because it makes multiple copies of other columns. One of the columns that I want to compare has a one-to-many relationship with other columns of the same dataframe. Let me show you:

df1:

domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click
sess2              sidebar    travellers    babies             click
sess3              main       gallery       gallery_click      click


df2:

domain_sessionid   category   subcategory   label              action
sess1               main       gallery       gallery_click      click
sess10              main       offer_desc    show_more_button   click
sess20              sidebar    travellers    babies             click
sess30              main       gallery       gallery_click      click


resultant:
domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click

As you can see in the resultant df, I want to keep only those entries where session ids match and the rest of the values from df1. Please suggest something.

N91
  • 395
  • 1
  • 3
  • 14
  • Find more information here: https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe – Erfan Mar 31 '19 at 21:32

1 Answers1

1

You want to use .isin:

df_both = df1[df1.domain_sessionid.isin(df2.domain_sessionid)]
print(df_both)

  domain_sessionid category subcategory             label action
0            sess1     main     gallery     gallery_click  click
1            sess1     main  offer_desc  show_more_button  click
Erfan
  • 40,971
  • 8
  • 66
  • 78