The context and the problem below are only examples that can help to visualize the question.
Context: Let's say that I'm continously generating random binary vectors G with length 1x64 (whose values are either 0 or 1).
Problem: I don't want to check vectors that I've already checked, so I want to create a kind of table that can identify what vectors are already generated before.
So, how can I identify each vector in an optimized way?
My first idea was to convert the binary vectors into decimal numbers. Due to the maximum length of the vectors, I would need 2^64 = 1.8447e+19
numbers to encode them. That's huge, so I need an alternative.
I thought about using hexadecimal coding. In that case, if I'm not wrong, I would need nchoosek(16+16-1,16) = 300540195
elements, which is also huge.
So, there are better alternatives? For example, a kind of hash function that can identify that vectors without repeating values?