how to find the difference between two dataFrame Pandas

Question

I have two dataFrame, both of them have name column, I want to make new dataframe of dataframeA have and dataframeB don't have

dataframeA
id     name
 1      aaa
 2      bbbb
 3      cccc
 4      gggg

dataframeB
id     name
 1      ddd
 2      aaa
 3      gggg

new dataframe

id     name
 1      bbbb
 2      cccc

Try searching SO first as common questions are mostly answered — Prayson W. Daniel, Jul 26 '21 at 12:26

score 0 · Answer 1 · answered Jul 26 '21 at 12:33

0

If I understand correctly, ou can merge the two dataframes

import pandas as pd
merged_df = pd.merge(dataframe_a, dataframe_b, on='name')

answered Jul 26 '21 at 12:33

azal

1,210
6
23
43

score 0 · Answer 2 · answered Jul 26 '21 at 12:33

You can use reduce from functools, or you can use isin, to create a new_df that only contains values in dfA that are also present in dfB.

Approach 1 using reduce:

from functools import reduce #import package

li = [dfA, dfB] #create list of dataframes
new_df = reduce(lambda left,right: pd.merge(left,right,on='name'), li) #reduce list

Approach 2 using isin:

new_df = dfA[dfA['name'].isin(dfB['name])]

score 0 · Answer 3 · answered Jul 26 '21 at 12:41

One way you could do this is to utilise python's set functionality.

This will convert the specified columns to sets and then create a new dataframe using the output.

dataframe = pd.DataFrame(data = {
    'name': list(set(dataframeA['name'].tolist()) - set(dataframeB['name'].tolist()))
})

how to find the difference between two dataFrame Pandas

3 Answers3