The key idea of Locality sensitive hashing (LSH) is that neighbor points, v are more likely
mapped to the same bucket but points far from each other are more likely mapped to different buckets. In using Random projection, if the the database contains N samples each of higher dimension d, then theory says that we must create k randomly generated hash functions, where k is the targeted reduced dimension denoted as g(**v**) = (h_1(v),h_2(v),...,h_k(v))
. So, for any vector point v, the point is mapped to a k-dimensional vector with a g-function. Then the hash code is the vector of reduced length /dimension k and is regarded as a bucket. Now, to increase probability of collision, theory says that we should have L such g-functions g_1, g_2,...,g_L
at random. This is the part that I do not understand.
Question : How to create multiple hash tables? How many buckets are contained in a hash table?
I am following the code given in the paper Sparse Projections for High-Dimensional Binary Codes
by Yan Xia et. al Link to Code
In the file Coding.m
dim = size(X_train, 2);
R = randn(dim, bit);
% coding
B_query = (X_query*R >= 0);
B_base = (X_base*R >=0);
X_query
is the set of query data each of dimension d and there are 1000 query samples; R
is the random projection and bit is the target reduced dimensionality. The output of B_query
and B_base
are N
strings of length k
taking 0/1 values.
Does this way create multiple hash tables i.e. N
is the number of hash tables? I am confused as to how. A detailed explanation will be very helpful.