How to compare Two dataframes row by row?

Question

I have 152431 X 15 shape data frame and I want the difference of two frames

# df1:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green

# df2:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple  22.1 Red
2013-11-25 Orange  8.6 Orange

Is this what you want https://stackoverflow.com/questions/17095101/outputting-difference-in-two-pandas-dataframes-side-by-side-highlighting-the-d — Phung Duy Phong, Feb 21 '20 at 10:23

score 0 · Answer 1 · answered Feb 21 '20 at 11:12

0

if your dataframes are stored in two files I would read each line of each file in a loop and create a list with the differences:

old_file_path = 'INSERT_FILE_PATH_OF_FILE_A'
new_file_path = 'INSER_FILE_PATH_OF_FILE_B'

with open(old_file_path, 'r', encoding='utf-8') as old ,open(new_file_path, 'r', encoding='utf-8') as new:
    fileone = old.readlines()
    filetwo = new.readlines()

total_of_changes=[]
for line in filetwo:
    if line not in fileone:
        total_of_changes.append(line)

answered Feb 21 '20 at 11:12

emiljoj

399
1
7

Nooooo, please don't do that! Especially when using pandas, there are **far** better options than reading and comparing each file **line-by-line**. With 152k rows, this is absolutely inefficient and furthermore unpythonic and clumsy. – JE_Muc Feb 21 '20 at 12:11
Fair enough, a more pythonic approach would help me too. Did you have a specific function in mind? :) – emiljoj Feb 21 '20 at 14:07
1

Yes, Chris A posted a nice solution in his comment: `pd.concat([df1, df2]).drop_duplicates(keep=False)` – JE_Muc Feb 21 '20 at 14:19

How to compare Two dataframes row by row?

1 Answers1