0

I have two datasets with the same attributes but in some of the rows the information is changed. I want to extract the rows in which the information has been changed.

  • 2
    Your question is too broad. Please post representative sample data in a format that is easily digestible. – Felix Apr 02 '19 at 15:13
  • Provide some data sample and read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Terry Apr 02 '19 at 15:15

1 Answers1

0

pandas offers a rich API that you can use to manipulate data whatever you want. merge method is one of them. merge is high performance in-memory join operations idiomatically very similar to relational databases like SQL.

df1 = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
print(df1)

   A  B
0  1  4
1  2  5
2  3  6

df2 = pd.DataFrame({'A':[1,10,3],'B':[4,5,6]})
print(df2)
    A  B
0   1  4
1  10  5
2   3  6

df3 = df1.merge(df2.drop_duplicates(),how='right', indicator=True)
print(df3)
    A  B      _merge
0   1  4        both
1   3  6        both
2  10  5  right_only

as you can see there is new column named _merge has description of how this row merged, both means that this row exists in both data frames , where right_only means that this row exist only on right data frame which in this case is df2 if you want to get only row had been changed than you can filter on _merge column

df3 = df3[df3['_merge']=='right_only']
   A   B      _merge
2  10  5  right_only

Note: you can do merge using left join by change how argument to left and this will grap every thing in left data frame (df1) and if row exists in df2 as well then _merge column will show up both. take a look at here for more details

Kero
  • 1,924
  • 5
  • 33
  • 53