4

I downloaded a test image from Wikipedia (the tree seen below) to compare Pillow and OpenCV (using cv2) in python. Perceptually the two images appear the same, but their respective md5 hashes don't match; and if I subtract the two images the result is not even close to solid black (the image shown below the original). The original image is a JPEG. If I convert it to a PNG first, the hashes match.

The last image shows the frequency distribution of how the pixel value differences.

As Catree pointed out my subtraction was causing integer overflow. I updated to converting too dtype=int before the subtraction (to show the negative values) and then taking the absolute value before plotting the difference. Now the difference image is perceptually solid black.

This is the code I used:

from PIL import Image
import cv2
import sys
import md5
import numpy as np

def hashIm(im):
    imP = np.array(Image.open(im))

    # Convert to BGR and drop alpha channel if it exists
    imP = imP[..., 2::-1]
    # Make the array contiguous again
    imP = np.array(imP)
    im = cv2.imread(im)

    diff = im.astype(int)-imP.astype(int)

    cv2.imshow('cv2', im)
    cv2.imshow('PIL', imP)
    cv2.imshow('diff', np.abs(diff).astype(np.uint8))
    cv2.imshow('diff_overflow', diff.astype(np.uint8))

    with open('dist.csv', 'w') as outfile:
        diff = im-imP
        for i in range(-256, 256):
            outfile.write('{},{}\n'.format(i, np.count_nonzero(diff==i)))

    cv2.waitKey(0)
    cv2.destroyAllWindows()

    return md5.md5(im).hexdigest() + '   ' + md5.md5(imP).hexdigest()

if __name__ == '__main__':
    print sys.argv[1] + '\t' + hashIm(sys.argv[1])

Original photo of a tree (from Wikipedia "Tree" article)

Frequency distribution updated to show negative values.

Updated difference


This is what I was seeing before I implemented the changes recommended by Catree.

Difference

Dist

BSMP
  • 4,596
  • 8
  • 33
  • 44
chew socks
  • 1,406
  • 2
  • 17
  • 37
  • i may be wrong, but i think you may be suffering from rounding errors here `np.array(Image.open(im))` and iirc imshow stretches the colors to meet the range (look at the actual values of `im-imP` they are likely to be very small) – Nullman Apr 22 '18 at 10:45
  • I just [tried](https://i.imgur.com/f0gkpD8.jpg) it, but no differences here (win10 + python3.6 + opencv-python==3.4.0.12 + Pillow==5.1.0). – Joost Apr 22 '18 at 10:50
  • @Nullman Interesting idea, but i'm not sure what you mean by rounding errors. `np.array(Image.open(im))` gives a `dtype=np.uunt8`. I posted a graph of the distribution of values in `im-imP`. – chew socks Apr 22 '18 at 11:08
  • since joost managed to run it without issues, perhaps something indeed is different in your version of those libraries. what versions are you using? – Nullman Apr 22 '18 at 11:14
  • @Joost Interesting... It's possible the image was re-encoded when I uploaded it to SO. Do you mind trying it [with the original](https://upload.wikimedia.org/wikipedia/commons/e/eb/Ash_Tree_-_geograph.org.uk_-_590710.jpg) if you didn't already? – chew socks Apr 22 '18 at 11:14
  • @Nullman I'm on Ubuntu 16.04, Python2.7.12, opencv-2.4.9.1, opencv-python 3.4.0.12, numpy 1.14.2, Pillow 5.1.0. I'm in a `virtualenv` so at least the python libs should be up to date. – chew socks Apr 22 '18 at 11:17
  • i ran the code as well (these 2 lines are backwards! `im = cv2.imread(im) imP = np.array(Image.open(im))` and it works fine for me, i got cv2 version 3.3.0 and pillow 4.3.0 on python 3.6.3 on win10 – Nullman Apr 22 '18 at 11:32
  • @Nullman Well I'll be...that's what I get for trying to make my post prettier. I edited with an identical copy `cat chck.py` now, which has them in the right order. It's interesting that both of you are on Win10. I'll see if I can try it on a Windows box tonight. – chew socks Apr 22 '18 at 11:39
  • i think the major difference here is that both of us are on python3 – Nullman Apr 22 '18 at 11:41
  • I did use both the one uploaded to SO and from wikipedia, but both showed the same. – Joost Apr 22 '18 at 16:10

1 Answers1

5

The original image is a JPEG.

JPEG decoding can produce different results depending on the libjpeg version, compiler optimization, platform, etc.

Check which version of libjpeg Pillow and OpenCV are using.

See this answer for more information: JPEG images have different pixel values across multiple devices or here.

BTW, (im-imP) produces uint8 overflow (there is no way to have such a high amount of large pixel differences without seeing it in your frequency chart). Try to cast to int type before doing your frequency computation.

Catree
  • 2,477
  • 1
  • 17
  • 24
  • 1
    Thanks! Didn't know that about JPEG. I updated my frequency chart to use `int`. Turns out all the values of `im-imP != 0` are negative. I haven 't found the exact versions yet but my `OpenCV` is pretty old and it's using built in sources for `libjpeg` – chew socks Apr 22 '18 at 20:46