3

I'm looking for a way to create a unique hash for images in python and php.

I thought about using md5 sums for the original file because they can be generated quickly, but when I update EXIF information (sometimes the timezone is off) it changes the sum and the hash changes.

Are there any other ways I can create a hash for these files that will not change when the EXIF info is updated? Efficiency is a concern, as I will be creating hashes for ~500k 30MB images.

Maybe there's a way to create an md5 hash of the image, excluding the EXIF part (I believe it's written at the beginning of the file?) Thanks in advance. Example code is appreciated.

ensnare
  • 40,069
  • 64
  • 158
  • 224
  • Possible duplicate: https://stackoverflow.com/questions/10075065/compute-hash-of-only-the-core-image-data-excluding-metadata-for-an-image – Rolf Mar 30 '19 at 20:39

2 Answers2

2

Imagemagick already provides a method to get the image signature. According to the PHP documentation:

Generates an SHA-256 message digest for the image pixel stream.

So my understanding is that the signature isn't affected by changes in the exif information.

Also, I've checked that the PythonMagick.Image.signature method is available in the python bindings, so you should be able to use it in both languages.

jcollado
  • 39,419
  • 8
  • 102
  • 133
1

In Python, you could use Image.tostring() to compute the md5 hash for the image data only, without the metadata.

import Image
import hashlib

img = Image.open(filename).convert('RGBA')
m=hashlib.md5()
m.update(img.tostring())
print(m.hexdigest())
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Is there a way I can generate an identical hash in php? I use both languages in my application. Thanks. – ensnare Dec 25 '11 at 21:13
  • You could consider using php's GD library to create temporary jpegs (which would have the exif data removed) and then have the resulting binary. Howevery, this is not going to be a high-efficiency process for 30MB images... unless you have a huge amount of RAM this is going to be a little slow. – Ben D Dec 25 '11 at 21:29