Ignore image name while getting hash

Question

I'm coding a program which'll take an image for an input, check it against images in a database and output the image with the same hash

However, when using hash("imagepath") 2 of the same images give different hashes, even when the only difference is the image's name, which makes me believe the name is the issue

Is there a way to easily ignore the name of the image? (png)

`hash("imagepath")` hashes the file name only, not the contents. You need to read the contents. — Tim Roberts, Jan 20 '22 at 20:26
also `hash` is not a cryptographic hash function. depending on your needs you may need to choose a different function. — hiro protagonist, Jan 20 '22 at 20:29
reading files: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files — hiro protagonist, Jan 20 '22 at 20:30
Actually, most of the hash libraries want byte strings, which would be `hash(open("imagepath","rb").read())`. You may need to experiment. — Tim Roberts, Jan 20 '22 at 21:12
Does this answer your question? [Hashing a file in Python](https://stackoverflow.com/questions/22058048/hashing-a-file-in-python) — tgpz, Jan 20 '22 at 22:21

score 0 · Accepted Answer · answered Jan 30 '22 at 19:50

How I solved it: I ended up not using "hashing" but the average pixel by scrambeling pieces of code together, and then find an image with the same average pixel (the average pixels are in a list so it gets the index which it then uses to find a name)

import requests

#Database of possible image average pixels
clone_imgs = [88.0465, 46.2568, 102.6426 ...]

image = <image url>
img_data = requests.get(image).content
with open('image.png', 'wb') as handler: #Download the image as "image.png" (Replace "image.png" with the path where you want to save it)
    handler.write(img_data)
img = Image.open(r"image.png") #Open the image for reading
img = img.resize((100, 100), Image.ANTIALIAS) #A series of compressions to the image
img = img.convert("L")
img_pixel_data = list(spawn.getdata())
img_avg_pixel = sum(spawn_pixel_data)/len(spawn_pixel_data) #Get the average pixel values

clone_img_index = clone_imgs.index(img_avg_pixel) #Find the same pixel value in the database

This worked for me but it has a few downsides:

The images need to be 100% the same in color (A single pixel off can ruin it)
One of these average pixels can make an infinite amount of images, my database only contained 800 so it still worked (However I had to go from compression to 10x10 to 100x100 to not end up with clones)

I guess this is too late, but there is a [perceptual hash library](https://github.com/JohannesBuchner/imagehash) for Python. These are designed to produce matching or similar hashes even if the images are slightly different. — Nick ODell, Jan 30 '22 at 20:04

Ignore image name while getting hash

1 Answers1