2

I'm using the ImageHash module to get hashes of images. I have this code:

hashSize = 8
imghash3 = []
image = "pic1.jpg"

imghash1 = imagehash.phash(Image.open(image))
print(imghash1)
>>>d1d1f1f3f3737373
imghash2 = str(imagehash.phash(Image.open(image), hashSize))
print(imghash2)
>>>11b97c7eb158ac
imghash3.append(bin( int(imghash2, 16))[2:].zfill(64))
print(imghash3)
>>>['0000000000010001101110010111110001111110101100010101100010101100']

So imagehash1 is the basic usage of the module.

Now what I don't understand is what kind of transformation the hashSize made to the original string in imagehash2 and how the 3rd function convert the imagehash2 into a 64 bit string.

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
Hyperion
  • 2,515
  • 11
  • 37
  • 59

1 Answers1

1

During phash computation original image is resized. hashSize parameter basically controls height and width of resized image.

Algorithm can be found here. Implementation of the first step (reduce size):

image = image.convert("L").resize((hash_size, hash_size), Image.ANTIALIAS)

See sources of imagehash.phash


Lets see what the line imghash3.append(bin( int(imghash2, 16))[2:].zfill(64)) does.

In [16]: imghash2 = '11b97c7eb158ac'

First of all it converts hexadecimal string into integer

In [17]: int(imghash2, 16)
Out[17]: 4989018956716204

The builtin bin function is applied to convert the integer into a binary string

In [18]: bin( int(imghash2, 16))
Out[18]: '0b10001101110010111110001111110101100010101100010101100'

Drops first two characters using list slice

In [19]: bin( int(imghash2, 16))[2:]
Out[19]: '10001101110010111110001111110101100010101100010101100'

Adds 0 on the left side to make a string of 64 characters total

In [20]: bin( int(imghash2, 16))[2:].zfill(64)
Out[20]: '0000000000010001101110010111110001111110101100010101100010101100'
Community
  • 1
  • 1
Konstantin
  • 24,271
  • 5
  • 48
  • 65