We have a database of images where I have calculated the PHASH using Dr. Neal Krawetz's method as implemented by David Oftedal.
Part of the sample code calculates the difference between these longs is here:
ulong hash1 = AverageHash(theImage);
ulong hash2 = AverageHash(theOtherImage);
uint BitCount(ulong theNumber)
{
uint count = 0;
for (; theNumber > 0; theNumber >>= 8) {
count += bitCounts[(theNumber & 0xFF)];
}
return count;
}
Console.WriteLine("Similarity: " + ((64 - BitCount(hash1 ^ hash2)) * 100.0) / 64.0 + "%");
The challenge is that I only know one of these hashes and I want to query SOLR to find other hashes in order of similarity.
A few notes:
- Using SOLR here (only alternative I have is HBASE)
- Want to avoid installing any custom java into solr (happy to install an existing plugin)
- Happy to do lots of pre-processing in C#
- Happy to use multiple fields to store data as a bit string, long, etc
- Using SOLRNet as a client
Edit, some extra information (apologies I am caught up in the problem and started assuming it was a widely known area). Here is a direct download to the C# console / sample app: http://01101001.net/Imghash.zip
An example output of this console app would be:
004143737f7f7f7f phash-test-001.jpg
0041417f7f7f7f7f phash-test-002.jpg
Similarity: 95.3125%