Best way of comparing two files in python and why?

Question

I have two files a.txt and b.txt, So i am trying to compare using hash like below.

#getting hash of files and comparing
file1 = hashlib.md5(open('a.txt', 'rb').read()).hexdigest()
file2 = hashlib.md5(open('b.txt', 'rb').read()).hexdigest() 
file1==file2--> returns True or False

this is one way and also we can do using filecmp as below

filecmp.cmp('a.txt','b.txt')--> returns True or False

In both of these ways which is better and why?

what do u need compare? line by line, or each line in all the document? could you please give us some example? — Carlo 1585, Jul 31 '18 at 11:22
Possible duplicate of [Is MD5 still good enough to uniquely identify files?](https://stackoverflow.com/questions/4032209/is-md5-still-good-enough-to-uniquely-identify-files) — awesoon, Jul 31 '18 at 11:22
I'd encourage you to compare it yourself by timing it (with different file sizes). — Eduard, Jul 31 '18 at 11:23
Note that by default (with `shallow=True`) `cmp` will compare files by `os.stat`. With `shallow=False` it will compare the files content — awesoon, Jul 31 '18 at 11:23
@soon how that que will be duplicate, my question was about comparision between these two — praveen jp, Aug 01 '18 at 08:28
I think no one understood question, except @user803422. tq man.. — praveen jp, Aug 01 '18 at 08:30

score 2 · Accepted Answer · edited Jan 03 '19 at 17:10

2

filecmp.cmp('a.txt','b.txt', shallow=False) is just what you need for comparing 2 files.

hashlib.md5() will add complexity, be more CPU intensive, take longer, and most importantly it will give a wrong result when 2 different files have the same md5 hash.

edited Jan 03 '19 at 17:10

user7610

25,267
15
124
150

answered Jul 31 '18 at 11:26

user803422

2,636
2
18
36

Best way of comparing two files in python and why?

1 Answers1