Get the rows that are not shared between two data frames

Question

I have two data frames with the exact same column, but one of them have 1000 rows (df1), and one of them 500 (df2). The rows of df2 are also found in the data frame df1, but I want the rows that are not.

For example, lets say this is df1:

          Gender       Age
1           F           43
3           F           56
33          M           76
476         F           30
810         M           29

and df2:

          Gender       Age
3           F           56
476         F           30

I want a new data frame, df3, that have the unshared rows:

          Gender       Age
1           F           43
33          M           76
810         M           29

How can I do that ?

score 2 · Accepted Answer · answered Jan 26 '23 at 16:20

2

Use pd.Index.difference:

df3 = df1.loc[df1.index.difference(df2.index)]

answered Jan 26 '23 at 16:20

SomeDude

13,876
5
21
44

score 0 · Answer 2 · answered Jan 26 '23 at 16:25

This has many ways. I know 3 ways for it.

first way:

df = df1[~df1.index.isin(df2.index)]

second way:

left merge 2 dataframes and then filter rows that just in df1

third way:

Add a column to both dataframes that define the source and then concat two dataframes with axis=1

then countt each index in data frame and filter all index that seen 1 time and has a record with source=df1

finally: Use from first way. It is very very faster

score 0 · Answer 3 · answered Jan 26 '23 at 16:32

You can concatenate two tables and delete any rows that have duplicates:

df3 = pd.concat([df1, df2]).drop_duplicates(keep=False)

The keep parameter ask you if you want to keep the row that has duplicates or not. If keep = True, it will keep 1 copy of the rows. If false, delete all rows that have duplicates.

Get the rows that are not shared between two data frames

3 Answers3