I have a number of blockhash based fingerprints of images stored in a table along with the web site member who uploaded it and the local URL of the image:
member varchar(8)
fingerprint char(64)
url varchar(80)
I am trying to do a Hamming Distance calculation on these hashes to determine how likely they are to be a match (reference mysql hamming distance between two phash).
Given the easiest way I have seen to do this is to use MySql's bit_count function to XOR the two and produce a total number of on bits, I know that I have to break the 64-character hash into 4 chunks, then convert each to unsigned integer before feeding it to bit_count. So, I have a query that does such (run from Linux command line, hence the argument variables):
select bit_count(cast(conv(substr('$1', 1, 16), 16, 10) as unsigned) ^ cast(conv(substr($2, 1, 16), 16, 10) as unsigned)) +
bit_count(cast(conv(substr('$1', 17, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 17, 16), 16, 10) as unsigned)) +
bit_count(cast(conv(substr('$1', 33, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 33, 16), 16, 10) as unsigned)) +
bit_count(cast(conv(substr('$1', 49, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 49, 16), 16, 10) as unsigned));
..And this produces the proper results between two fingerprints.
However, I need a query that will find matching fingerprints from anyone other than the member in question. Basically:
select member, url
from images
where (Hamming Distance between <fingerprint> and (select hashes from member) < 10)
AND member != "<value>"
I think I might want to create a stored procedure to determine the Hamming Distance, then maybe limit the results I have to check from the entire database to something like those matching the first 10 characters. But is there a better way?