5

I've been exposed via StackOverflow to pHash, a C++ perceptual hash library for audio, video, images, and text fingerprinting - recently with preliminary bindings for PHP, C# and Java.

I'm interested in studying these algorithms and I'm wondering if there are any open-source pure Python or PHP implementations of the same / similar algorithm? This would make my life a lot easier.

Community
  • 1
  • 1
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • 1
    Are you looking for these implementations so you can study them or just use them? Or both I suppose. I ask because the pHash site has a pretty nice description of it's various algorithms. http://www.phash.org/docs/design.html – zdav Jul 09 '10 at 22:22
  • @zdav: Maybe both. But I'm more eager to learn than to use it. – Alix Axel Jul 09 '10 at 22:24
  • @zdav: That's interesting. But all I see is graphs, I've no idea how to compute the hamming distance of a video or an audio file or how to hashify the results so that they can be easily looked up for comparison. I like the math and advanced CS knowledge (trying to get into a CS Master degree this year though) and studying the code of a language I'm more familiar with would be easier for me to understand. In case this is not possible I may try to port the C++ though. – Alix Axel Jul 10 '10 at 13:10
  • Hamming distance is one of the basics in cryptology and hashing, so sooner or later you may have to overcome with that. The syntax of C++ and PHP is pretty similar, so C++ programs should be fairly readable once you understood the concepts of the programming language. Christoph Zauner has published his master thesis on the phash website (http://www.phash.org/docs/pubs/thesis_zauner.pdf). It has a section on image hashing and does a lot of theoretical discussion. It looks German on the outside but the thesis itself is written in English. – syck Dec 15 '16 at 12:36

1 Answers1

1

I have been searching on Google, but not much has come up. Since it seems you want the code for academic purposes, I would suggest:

  • Hit Wikipedia - look up each algorithm and get a feel for how it works

  • Check the pHash site's mailing list - I doubt you are the first person to be curious.

  • Email the authors and ask what sources they used (books, papers, etc.)

  • Use bookstore, library, etc. to find your own sources

I personally find that studying code is very ineffective at teaching algorithms (at first anyway, until you have a feel for the overall process).

zdav
  • 2,752
  • 17
  • 15