2

I need a random string generator that generates an alpha-numeric string to use as an unique key in a distributed system that is 30 characters or less. It cannot contain any special characters.

Will RandomStringUtils#randomAlphanumeric work for this?

The underlying implementation uses java.util.Random.

The set of unique keys will probably be less than 100 billion, and the system needs to be able to handle up to 1000 records per second.

How can I prove that this strategy has a low enough probability of collision to work as a primary key generator?

cosbor11
  • 14,709
  • 10
  • 54
  • 69
  • How about generating identifiers using low and high ids? (http://stackoverflow.com/questions/282099/whats-the-hi-lo-algorithm) – Caramiriel Jan 09 '16 at 20:03

3 Answers3

4

java.util.Random implements a LCG algorithm and its period is 2^48 numbers, so RandomStringUtils will be as good as this implementation and 100 billion of 30-character strings would require ~ 1% of 2^48 random elements.

Note that java.util.Random is not cryptographically secure, so given some GUIDs it is possible to infer the next one, so I'd use another implementation that uses a cryptographically secure random number generator (e.g. java.util.SecureRandom).

  • 1
    Thanks for answering! Can you explain your quantitative figures; `its period`?? and `~ 1% of 2^48` – cosbor11 Jan 09 '16 at 07:04
  • 1
    Using java.util.Random, after 2^48 invocations you'll start getting the same numbers. In your case you need 100 billion of 30-character strings, so it means 3 trillion of random elements, and 3 trillion is approx. 1% of 2^48, meaning that you can generate 100 times 100 billion of 30-character strings before the sequence is repeated. –  Jan 09 '16 at 07:17
  • Ok, that makes more sense. Are you sure the algorithm does not repeat a sequence of chars until all combinations are exhausted? – cosbor11 Jan 09 '16 at 19:46
  • This is the theory for LCG, although java.util.Random doesn't state this fact explicitly. –  Jan 10 '16 at 01:29
1

Why you don't want to use java.util.UUID class? It returns random UUID of 32bit characters String. Example implementation:

import java.util.UUID;

public class GenerateUUID {

   public static final void main(String... aArgs){
     //generate random UUIDs
     UUID idOne = UUID.randomUUID();
     UUID idTwo = UUID.randomUUID();
     log("UUID One: " + idOne);
     log("UUID Two: " + idTwo);
   }

   private static void log(Object aObject){
     System.out.println(String.valueOf(aObject));
   }
} 
m.aibin
  • 3,528
  • 4
  • 28
  • 47
1

Random is not unique. Using random number generation to get "unique" values runs into the Birthday Problem https://en.wikipedia.org/wiki/Birthday_attack. This turns the a 1 in 2^48 probability into a 1 in 2^24 probability, which you'll end up hitting quicker than you think. Use UUIDs; they're designed to be universally unique.

32 characters:

UUID.randomUUID().toString.replace("-","")

22 characters:

UUID uuid = UUID.randomUUID();
String uuidStr = Base64.encodeBase64URLSafeString(ByteBuffer.wrap(new byte[16])
    .putLong(uuid.getMostSignificantBits())
    .putLong(uuid.getLeastSignificantBits())
            .array()).replace("=", "");
pdubs
  • 2,698
  • 1
  • 16
  • 15