1

I'm trying to use the imagehash library (https://pypi.org/project/ImageHash/) to identify visually identical files. I'm testing with 3 files. The second is just a reduced resolution of the first. File 3 is very different. Images below.

I wrote a simple python program to diff the images using imagehash:

from PIL import Image
import imagehash
import os
import sys

def gethash(relPath):
    script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
    path = os.path.join(script_dir, relPath)
    return imagehash.phash(Image.open(path))
    
print(gethash(sys.argv[1]) - gethash(sys.argv[2]))

When I run it from the commandline, image 1 and 2 have the same difference than 1 to 3. What am I doing wrong with imagehash?

PS C:\pickle\lambda\hash> py .\testih.py .\img\1.jpg .\img\2.jpg
36
PS C:\pickle\lambda\hash> py .\testih.py .\img\1.jpg .\img\3.jpg
36
PS C:\pickle\lambda\hash> py .\testih.py .\img\2.jpg .\img\3.jpg
30

I have tried phash, average_hash, dhash, all with similar results. Thank you for any advice!

1.jpg https://picklepics.app/misc/1.jpg

2.jpg https://picklepics.app/misc/2.jpg

3.jpg https://picklepics.app/misc/3.jpg

Bktrout47
  • 31
  • 3
  • For anyone else that has this problem, I was able to solve it. For some reason, one of my two test images was actually rotated 180 degrees, but had the EXIF "orientation" set. So the hashes were for images that were upside down from eachother. The solution was to load the image in PIL Image, check the orientation first, and then rotate it to how a person is seeing the image. – Bktrout47 Feb 22 '22 at 17:51

0 Answers0