I have a website with a lot of content and I am working on removing duplicates. For this I need to compare two strings and check their match percentage. I am using the ruby simhash gem: https://github.com/bookmate/simhash
The gem takes a string and returns an integer hash. I am not sure how to compare the two hashes.
X = 'King Gillette'.simhash(:split_by => //)
y = 'King Camp Gillette'.simhash(:split_by => //)
X >> 13716569836
y >> 13809628900
Can I take the difference and then percentage? Does that indicate the difference between the strings?