Byte representation of an image differs depending on method used to read it

Question

I was trying to perform some data augmentation in object detection models in tensorflow so I was checking the compatibility of different image representations.

First I was just reading an image file using PIL (Pillow to be precise)

full_path = 'path/to/my/image.jpg'
image = PIL.Image.open(full_path)
image_np = np.array(image)
encoded_jpg_io1 = io.BytesIO(image_np)

Then I used the tensorflow version (used to create tfrecords as well):

with tf.gfile.GFile(full_path, 'rb') as fid:
    encoded_jpg = fid.read()
encoded_jpg_io2 = io.BytesIO(encoded_jpg)

And then I checked the equality of the above operations:

if encoded_jpg_io1 == encoded_jpg_io2:
    print('Equal')

I was expecting those two to be equal. So, why this is not the case here?

If I use the bytes I get the same result:

v1 = encoded_jpg_io1.getvalue()
v2 = encoded_jpg_io2.getvalue()
if encoded_jpg_io1.getvalue() == encoded_jpg_io2.getvalue():
    print('Equal')
if v1.__eq__(v2):
    print('Equal')

I need to manipulate my images with numpy and then create some tfrecords so the equality is required.

Some interesting facts:
1. PIL cannot read the image in np.array format at all:

image1 = PIL.Image.open(encoded_jpg_io1)

OSError: cannot identify image file

While using GFile works fine:

image2 = PIL.Image.open(encoded_jpg_io2)

2.PIL image cannot be directly converted to BytesIO:

encoded_jpg_io1 = io.BytesIO(image)

TypeError: a bytes-like object is required, not 'JpegImageFile'

Why should objects of different classes which are defined in different modules correspond to the same sequence of bytes just because you called the respective constructors on the same file? Perhaps I don't understand what you are doing, but I don't know why you would expect `encoded_jpg_io1 == encoded_jpg_io2` to evaluate to True. In any event, perhaps [this answer](https://stackoverflow.com/a/44606972/4996248) will help. — John Coleman, Mar 28 '18 at 11:22
I am not checking any random modules as you say. They both process the same file and then use the same final module to produce the bytes. — Eypros, Mar 28 '18 at 11:31
`encoded_jpg_io1` was created via PIL and numpy and `encoded_jpg_io2` was created via tensorflow. Since these are fairly different pathways I am not surprised that they are different. The link that I gave above suggests that row-major vs column-major is one potential issue in converting a PIL image to a numpy array. — John Coleman, Mar 28 '18 at 11:52

Byte representation of an image differs depending on method used to read it

0 Answers0

Linked