Python compare series

Question

I have two dataframes and I want compare their columns. Based on this comparison I want to make a new dataframe. I have tried pd.merge, join but that is not working because it makes multiple copies of other columns. One of the columns that I want to compare has a one-to-many relationship with other columns of the same dataframe. Let me show you:

df1:

domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click
sess2              sidebar    travellers    babies             click
sess3              main       gallery       gallery_click      click


df2:

domain_sessionid   category   subcategory   label              action
sess1               main       gallery       gallery_click      click
sess10              main       offer_desc    show_more_button   click
sess20              sidebar    travellers    babies             click
sess30              main       gallery       gallery_click      click


resultant:
domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click

As you can see in the resultant df, I want to keep only those entries where session ids match and the rest of the values from df1. Please suggest something.

Find more information here: https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe — Erfan, Mar 31 '19 at 21:32

score 1 · Accepted Answer · answered Mar 31 '19 at 21:31

You want to use .isin:

df_both = df1[df1.domain_sessionid.isin(df2.domain_sessionid)]
print(df_both)

  domain_sessionid category subcategory             label action
0            sess1     main     gallery     gallery_click  click
1            sess1     main  offer_desc  show_more_button  click

Python compare series

1 Answers1