1

I'm learning about LSH and minhashing and I'm trying to understand the rational of hashing the signature matrix:

We divide the signature matrix to bands and we hash (using which hash function?) every portion of column to k buckets. Why would it make sense? If we use a regular hash function then even a slight difference in two columns would probably lead to different buckets.

I do understand the relation between the signature matrix to Jacard distance but I don't understand the next step which is essentially hashing that distributes items evenly.

Elimination
  • 2,619
  • 4
  • 22
  • 38

0 Answers0