7

I'm currently trying to get a hash from an image in python, i have successfully done this and it works somewhat.

However, I have this issue: Image1 and image2 end up having the same hash, even though they are different. I need a form of hashing which is more accurate and precise.

Image1 = Image1

Image2 = Image2

The hash for the images is: faf0761493939381

I am currently using from PIL import Image import imagehash

And imagehash.average_hash

Code here

import os
from PIL import Image
import imagehash


def checkImage():
    for filename in os.listdir('images//'):
        hashedImage = imagehash.average_hash(Image.open('images//' + filename))
    print(filename, hashedImage)

    for filename in os.listdir('checkimage//'):
        check_image = imagehash.average_hash(Image.open('checkimage//' + filename))
    print(filename, check_image)

    if check_image == hashedImage:
        print("Same image")
    else:
        print("Not the same image")

    print(hashedImage, check_image)


checkImage()
Random Davis
  • 6,662
  • 4
  • 14
  • 24
Logan
  • 87
  • 1
  • 1
  • 6
  • 1
    edit the question to include the code, not a link to the code – Paul H Nov 24 '20 at 20:08
  • Ok, sorry I will do this now. – Logan Nov 24 '20 at 20:08
  • Your images are nearly identical, and you're using an [average hash](https://web.archive.org/web/20171112054354/https://www.safaribooksonline.com/blog/2013/11/26/image-hashing-with-python/), which will return the same result for very similar images. Are you sure that your images aren't just too similar? Does another hashing algorithm return different results? – Random Davis Nov 24 '20 at 20:12
  • Yes, that's what I assume, I came here for confirmation, and maybe some more specific/precise hashing techniques. – Logan Nov 24 '20 at 20:12
  • @Javadeveloper103 did you try other hash types that that library offers? What do you specifically want our help with? Is there a specific application you need this hashing algorithm for? An application that requires the images' hashes to be in a specific format or anything like that? Would just a regular file hashing algorithm work? – Random Davis Nov 24 '20 at 20:15
  • I have looked into the other hashing techniques that this libary offers, and they don't seem to be what I need. My goal for this is to check memes, and if they are the same image it will delete it from my file, because memes often use the same images with altered text, i need somehting quite specific/precise. – Logan Nov 24 '20 at 20:18

2 Answers2

6

Try using hashlib. Just open the file and perform a hash.

import hashlib
# Simple solution
with open("image.extension", "rb") as f:
    hash = hashlib.sha256(f.read()).hexdigest()
# General-purpose solution that can process large files
def file_hash(file_path):
    # https://stackoverflow.com/questions/22058048/hashing-a-file-in-python

    sha256 = hashlib.sha256()

    with open(file_path, "rb") as f:
        while True:
            data = f.read(65536) # arbitrary number to reduce RAM usage
            if not data:
                break
            sha256.update(data)

    return sha256.hexdigest()

Thanks to Antonín Hoskovec for pointing out that it should be read binary (rb), not simple read (r)!

ericl16384
  • 326
  • 3
  • 12
2

By default, imagehash checks if image files are nearly identical. The files you are comparing are more similar than they are not. If you want a more or less unique way of fingerprinting files you can use a different approach, such as employing a cryptographic hashing algorithm:

import hashlib

def get_hash(img_path):
    # This function will return the `md5` checksum for any input image.
    with open(img_path, "rb") as f:
        img_hash = hashlib.md5()
        while chunk := f.read(8192):
           img_hash.update(chunk)
    return img_hash.hexdigest()
Deneb
  • 981
  • 2
  • 9
  • 25