0

I am in need of an algorithm or logic which allows to generate a token(alpha numeric) from a list of tokens. That Generated token must allows us to verify whether the given token is part of it or not.

Example: Consider a scenario you have 4 tokens like the below

 Input Code :  NFU122JKMO,MUE4UJ83RT,MA783FHNZS,ODNJU345KN.(assume)

I need to generate a single code which is a combination of all the above.The generated code must be alpha numeric like follows

Generated Code :   NIDU8934DF(assume).

Now, i need to validate the input code is a subset of generated code or not.

So it must return a Boolean value like true or false.

To be specific,I need to generate a code which have the information of all the input tokens.

I have searched the encryption and decryption algorithms,that will not suit my needs. Please,Share your ideas and algorithm for approach.

Thanks in Advance.

Obuli Sundar
  • 566
  • 7
  • 26
  • Basically you want a hash. Ever heard of MD5? – Kryptos Jun 12 '15 at 10:47
  • @Kryptos sry i am new to this. I doubt ,is it possible to check whether the input code is a part of generated code ? – Obuli Sundar Jun 12 '15 at 10:56
  • Should this operation be secure or not? For example can anyone produce such token knowing input tokens? And can anyone finds out input tokens from such token? – divanov Jun 12 '15 at 10:58
  • @divanov .In my case i need to generate that kind of token. I need a function to generate that kind of token and a function to validate that input token is a part of generated token or not. – Obuli Sundar Jun 12 '15 at 11:04
  • It's called concatenation and doesn't even require crypto. – CodesInChaos Jun 12 '15 at 11:08
  • @CodesInChaos : consider if you have 1 lakh tokens and if you concat it . the generated token will out of range. – Obuli Sundar Jun 12 '15 at 11:14
  • You can't do much better than that. There is no algorithm that can compress all data. (btw *lakh* is not used outside India. You'd rather write 100k.) – CodesInChaos Jun 12 '15 at 11:16
  • Almost the same question as [Storing 1 million phone numbers](http://stackoverflow.com/questions/5262465/storing-1-million-phone-numbers). The best you can do is sorting the tokens and then using huffman coding on their differences. Assuming the tokens are uniformly distributed, you'll at most save log_36(n!) characters in total, which works out to [3-4 characters per token](http://www.wolframalpha.com/input/?i=log_36%28n%21%29%2Fn+where+n%3D100000) for over 100k tokens. – CodesInChaos Jun 15 '15 at 10:53

1 Answers1

0

What you want is a Bloom Filter. You model it in such a way as to resemble a hash (it won't really be a hash). You will also have to fine tune it for the number of expected elements in the list. Since it is probabilistic, you can't be 100% sure that the supposed subset is an actual subset.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • This might not work for Obuli's requrements due to: query returns either "possibly in set" or "definitely not in set". Too sad we don't know these requirements. – divanov Jun 12 '15 at 11:52
  • yes, Since this approach is probabilistic i cant get the exact result from the same. – Obuli Sundar Jun 14 '15 at 04:23
  • If you want the result to be exact, the you somehow need to devise a way to map your inputs to numbers of the power of 2. If you then OR them together you get your token which will be at least as long as the input set sizein bits. To be secure, it must be twice as big. It seems there is no way of doing that with those requirements. – Artjom B. Jun 14 '15 at 07:30