1

I'm getting different hashes for the same .png file. The files have been created using imagemagick convert v6.9.1-10.

File creation:

$ convert test.pdf test_one.png
$ convert test.pdf test_two.png

Python:

import hashlib

h1 = hashlib.md5()
h1.update(open('test_one.png', 'r').read())
first_hash = h1.hexdigest()

h2 = hashlib.md5()
h2.update(open('test_two.png', 'r').read())
second_hash = h2.hexdigest()

I would expect first_hash to be the same as second_hash however that is not the case.

Why are the hashes not the same?

dosborn
  • 11
  • 2
  • Have you checked the obvious, like the two files are the same on disk, and that changing the second update to read test_one.png does give the same result as you know it is doing the same thing twice. This second test is to make sure that it really is repeatable and that hashlib doesn't add some random salt into the process. – Ian4264 Jul 06 '20 at 19:08
  • Thanks for the suggestions. I've just come across this answer which provided me with what I needed: https://stackoverflow.com/questions/2654281/how-to-remove-exif-data-without-recompressing-the-jpeg. I just needed to add `-strip` to the convert command. I think the .png files must have different timestamps embedded in them. – dosborn Jul 06 '20 at 19:25

1 Answers1

0

Found the answer here: How to remove EXIF data without recompressing the JPEG?

The images have different EXIF data.

Using the -strip flag for the convert command removes all the EXIF data and the hashes come out identical.

dosborn
  • 11
  • 2