13

I know you can use pHash from .NET or Java, but I would like a pure .NET (preferably) or Java implementation. Are there any others available? I am interested in the image hashing functionality specifically.

A perceptual hash is a way of creating a numeric hash of images and then being able to compare those hashs to see if the images are similar. It allows for really fast image recognition.

Jim McKeeth
  • 38,225
  • 23
  • 120
  • 194

1 Answers1

13

Here is an Java implementation of pHash for images by Elliot Shepherd.

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
  • 1
    This implementation suffers a major flaw. In step 6, it should not skip the first row and column of the 8x8 DCT Matrix. Just remove the condition and it should be fine. – Grooveek Apr 03 '15 at 08:45
  • Not only that, but it is not as accurate as the phash.org pHash. In extensive tests I get a lot of false-positives, which I do not get when testing with phash.org. The hashes are different, and the one produced by this code isn't as accurate somehow. – ndtreviv Sep 19 '16 at 11:33
  • @ndtreviv how did you measured accuracy? – om-nom-nom Sep 19 '16 at 13:09
  • I hashed over 30 million images, then queried for similarity and output to a file for visual checking. – ndtreviv Sep 19 '16 at 15:30
  • @om-nom-nom I then took the images that were picked up as similar, but were blatantly not, and tested against phash.org hashing, which correctly identified them as being not similar. There must be something that phash.org is doing that this implementation is not that loses accuracy. – ndtreviv Sep 20 '16 at 10:45
  • Guys, how did you solve this problem? I need to use pHash in Android, and hoped that the implementation you're talking about was fine. – Lorenzo Camaione Oct 16 '16 at 17:14
  • @LorenzoCamaione The DCT algorithm is screwed on it. That's one of the problems. The other one is that it uses the blue channel, rather than converting the image to 8-bit grayscale. Take a look at the python implementation (https://github.com/JohannesBuchner/imagehash/blob/master/imagehash/__init__.py), which uses scipy's DCT and PIL image library. It has better results. – ndtreviv Nov 11 '16 at 13:52
  • @LorenzoCamaione You can use this, but maybe change a couple of things, ie: look into using the DCT offered by JTransforms library (https://github.com/wendykierp/JTransforms); When you transform into greyscale, consider doing what PIL does, which is to adhere to the ITU-R 601-2 luma transform: `L = R * 299/1000 + G * 587/1000 + B * 114/1000` You'll instantly get better results. – ndtreviv Nov 11 '16 at 14:24
  • Also, consider using https://github.com/mortennobel/java-image-scaling for your image resizing – ndtreviv Nov 11 '16 at 14:37
  • Thnaks guys. At the end I found an open source application (Android) made by some guys in a Stanford course. In this application they implemented Histogram-based Image Hash algorithm without using any extern libraries. This allowed me to get this feature done. Anyway thank you everybody. – Lorenzo Camaione Nov 13 '16 at 10:16
  • Since I cannot post a direct answer, hopefully this will help others. This looks like a great new Java library for this problem: https://github.com/KilianB/JImageHash – Mike Thomsen Dec 12 '18 at 11:46