0

I have to store a huge quantity of scientific images generated by a robotic microscope. During the storing process I would like to store in the Exif metadata also a hash of the image data so to make every image identifiable and to determine if it was modified afterward. The image data comes as a 2d array of 16bit uint. The code I'm attempting to use is:

import math,png,io,hashlib, numpy as np
import piexif
import piexif.helper
from PIL import Image

  def MOEDAL_IMGWR_JPG_h(img,fname,q,js='{}',exif_ifd ={}):
        y=np.asarray(img);
        z = (65535*((y - y.min())/y.ptp())).astype(np.uint16)
        a=(np.array(z)//256).astype("uint8");
        im = Image.fromarray(a)
        im.save(fname, format='JPEG', quality=q) # save the image
        im=Image.open(fname)  # reload it
        img_byte_arr = io.BytesIO()
        im.save(img_byte_arr,format='PNG') # write the image content in memory
        jdc=json.loads(js)
        jdc['sha244']=hashlib.sha224(img_byte_arr.getvalue()).hexdigest()
        a=json.dumps(jdc)
        exif_dict = {"Exif":exif_ifd}
        exif_dict["Exif"][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(a)
        exif_bytes = piexif.dump(exif_dict)
        piexif.insert(exif_bytes,fname)  # This method is supposed to work (but does not)
#            im.save(fname, exif=exif_bytes) #

As you can see I try to reload the image data after saving the image with the desired compression calculate the hash and store it (as a JSON key) in the UserComment field of the exif metadata. In the second save I tried to use a compression
But when I try from the Python CLI to reload the image and calculate the hash again I obtain different value.

>>> im=Image.open('Test.jpg')
>>> img_byte_arr = io.BytesIO()
>>> im.save(img_byte_arr,format='PNG')
>>> exif_dict=piexif.load('TEST.jpg')
>>> print(exif_dict)
{'0th': {34665: 26}, 'Exif': {37510: b'ASCII\x00\x00\x00{"sha244": "94ae6bcfbb94c75c8adf65536993a03a107aa076cb94e20ef6bdff12"}'}, 'GPS': {}, 'Interop': {}, '1st': {}, 'thumbnail': None}
>>> hashlib.sha224(img_byte_arr.getvalue()).hexdigest()
'ef5966d665aaefc0d5b48c293957f66007c8dcaab7afc39f85a3964e'

Maybe I'm wrong with the last im.save but I have also tried to specify format='JPEG' and quality=100 so that I would not repeat the compression again. Any suggestions ? Thanks, G.L.

GBBL
  • 576
  • 1
  • 6
  • 18
  • I have tried also the method described here: https://stackoverflow.com/questions/53543549/change-exif-data-on-jpeg-without-altering-picture but seems not to work – GBBL Jun 20 '21 at 22:40
  • This won't work. JPEG readers and writers are allowed to make tradeoffs to save space (image size in bytes on disk) or to speed up the process. Each library, and each new version of each library may make a different tradeoff which will render your hash incorrect. Can you use lossless PNG? – Mark Setchell Jun 20 '21 at 23:00
  • I guess you *could* hash the entire JPEG once created and save the hash as `xattr` or in the filename or in a database or externally somehow - Mac resource fork or Windows NTFS Alternate Data Stream. – Mark Setchell Jun 20 '21 at 23:05

2 Answers2

0

This is not going to work, not this way. You calculate the hash of the image, then save it as a JPEG stream, and JPEG is lossy. So your image will change right then and there.

What you need to do is to save the image as JPEG, reload it and calculate the hash of the reloaded image, then update the EXIF metadata without further altering the image (normally EXIF tools do not alter the image stream, but I am not conversant with Python libraries; you should check).

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • As you read the code is exactly what I'm trying to do. I store the image then reload it and try to store the exif metadata. Do you know how to store the metadata without storing the image again ? – GBBL Jun 20 '21 at 21:56
  • See e.g. https://stackoverflow.com/questions/44636152/how-to-modify-exif-data-in-python – GBBL Jun 20 '21 at 22:00
  • Seems that part of the answer is HERE: https://stackoverflow.com/questions/53543549/change-exif-data-on-jpeg-without-altering-picture – GBBL Jun 20 '21 at 22:19
  • Sorry, I got confused by the two saves. What is causing problems is in all likelihood the final im.save() with modified Exif. But from what I found online, this library should work. https://pypi.org/project/exif/ – LSerni Jun 20 '21 at 22:19
  • In fact I have tried also piexif.insert(exif_bytes,fname) that insert the exif data .... but the image data seems still modified. – GBBL Jun 20 '21 at 22:34
0

Finally I have found the error in my function and I post the correct version because I think it can be useful also for others. The error were two one was the original save

#            im.save(fname, exif=exif_bytes) #

that was correctly substituted by

piexif.insert(exif_bytes,fname)

but the second and more subtle was about how the image bytes were extracted in order to compute the hash. The correct method (ore at least one that works) is to convert the image( read as grayscale ) content into a Numpy array and than convert the array into bytes. The final program line is:

jdc['sha244']=hashlib.sha224(np.float32(im).tobytes()).hexdigest()

So the complete function is:

def MOEDAL_IMGWR_JPG_h(img,fname,q,js='{}',exif_ifd ={}):
        y=np.asarray(img);
        z = (65535*((y - y.min())/y.ptp())).astype(np.uint16)
        a=(np.array(z)//256).astype("uint8");
        im = Image.fromarray(a)
        im.save(fname, format='JPEG', quality=q) # save the image
        im=Image.open(fname)  # load the saved image
        jdc=json.loads(js)    # prepare the JSON record from a
        jdc['sha244']=hashlib.sha224(np.float32(im).tobytes()).hexdigest() # compute the hash from the np array converted to bytes
        a=json.dumps(jdc)
        exif_dict = {"Exif":exif_ifd}
        exif_dict["Exif"][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(a)
        exif_bytes = piexif.dump(exif_dict)
        piexif.insert(exif_bytes,fname) # insert the Exif data
GBBL
  • 576
  • 1
  • 6
  • 18