-2

I have two files a.txt and b.txt, So i am trying to compare using hash like below.

#getting hash of files and comparing
file1 = hashlib.md5(open('a.txt', 'rb').read()).hexdigest()
file2 = hashlib.md5(open('b.txt', 'rb').read()).hexdigest() 
file1==file2--> returns True or False

this is one way and also we can do using filecmp as below

filecmp.cmp('a.txt','b.txt')--> returns True or False

In both of these ways which is better and why?

praveen jp
  • 65
  • 10
  • what do u need compare? line by line, or each line in all the document? could you please give us some example? – Carlo 1585 Jul 31 '18 at 11:22
  • 2
    Possible duplicate of [Is MD5 still good enough to uniquely identify files?](https://stackoverflow.com/questions/4032209/is-md5-still-good-enough-to-uniquely-identify-files) – awesoon Jul 31 '18 at 11:22
  • I'd encourage you to compare it yourself by timing it (with different file sizes). – Eduard Jul 31 '18 at 11:23
  • 1
    Note that by default (with `shallow=True`) `cmp` will compare files by `os.stat`. With `shallow=False` it will compare the files content – awesoon Jul 31 '18 at 11:23
  • @soon how that que will be duplicate, my question was about comparision between these two – praveen jp Aug 01 '18 at 08:28
  • I think no one understood question, except @user803422. tq man.. – praveen jp Aug 01 '18 at 08:30

1 Answers1

2

filecmp.cmp('a.txt','b.txt', shallow=False) is just what you need for comparing 2 files.

hashlib.md5() will add complexity, be more CPU intensive, take longer, and most importantly it will give a wrong result when 2 different files have the same md5 hash.

user7610
  • 25,267
  • 15
  • 124
  • 150
user803422
  • 2,636
  • 2
  • 18
  • 36