0

I want to convert a string of max length 15 to a unique long number. I am trying to use BigInteger's longValue() function for the same.

BigInteger bigInt = new BigInteger("abcdeabcdeabcda".getBytes());
long n = bigInt.longValue();
  1. Can we avoid collision of long value until 15 chars of String?
  2. String can contain alphanumeric including special character.
  3. The idea not to encrypt the string to long. But to improve the performance of count(distinct) of hive queries.
  4. We note that count(distinct) in hive provides good performance if long is used instead of string.
  5. We don't want approx. or probablistic count distinct. We want exact count distinct.

Thanks in Advance

Prabakaran
  • 128
  • 1
  • 9
  • There was a similar problem here: http://stackoverflow.com/questions/9309723/how-can-i-generate-a-long-hash-of-a-string – Athanor Aug 13 '14 at 08:47

1 Answers1

0

No you can't - at least not without collisions.

ASCII is 7-bits per character and 15 * 7 = 105 bits - you cannot fit that into a long.

You suggest you may not need full ASCII - perhaps base 64 which is 6-bit but 15 * 6 = 90, still way too long.

Even if case is irrelevant and you can get by without four of your alpha characters, using base 32 you still have 15 * 5 = 75 which is still too bug for a 64-bit number.

You will need to accept that there will be collisions but perhaps there are ways to reduce them. How are you generating these 15-character strings? Is there a pattern you can make use of?

The selected answer of the question @Athanor points out has a good idea - use two longs. 2 * 64 = 128. Your potentially 105 bit number using 7-bit ASCII would fit fine into two longs.

Community
  • 1
  • 1
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • Thanks. This is what I understood - I have to split the 15 char string in to two string 7chars , 8 chars. Now generate longValue for these two using Base 36 Long.parseLong("abcdeabcdeab", Character.MAX_RADIX). Now I have two longs- how can I convert into one long? – Prabakaran Aug 13 '14 at 11:10
  • @Prabakaran - You cannot - not without risking collisions. You need to put both longs in your DB and adjust your distinct clause to use both. See [here](http://stackoverflow.com/a/54430/823393) for how. – OldCurmudgeon Aug 13 '14 at 11:30