1

I am trying to produce many random strings consist of 4 digits, and they should not repeat to each other. I don't know the exact number, but around few hundreds. I tried nextInt():

public static String generateLogID() {

    Random rdm = new Random();
    String s = "";
    for (int i=0;i<4;i++) {
        String digit = String.valueOf(rdm.nextInt(9));
        s = s.concat(digit);    
    }
    return s;
}

However, when it comes around No.70 or 80, it got repeat string. Theoretically there will be 10*10*10*10 possibilities, why it repeat so soon, and what should I do to avoid repeat? Thank you for any advice!

I used HashMap to save all record to avoid repeat, and it works so well.

HashMap<Integer, String> map = new HashMap<Integer, String>();
int count = 0;
for(loop conditions){
 String id = IDGenerator.generateLogID();
                while(map.containsValue(id)){
                    id = IDGenerator.generateLogID();
                    }
                map.put(count, id);
                count++;
}

But what I really want to know is why this generator generate repeat so soon, and is there other generate method which lower the repetition rate?

codingbunny
  • 105
  • 8
  • Keep a Map of all values and check if the map already contains the new random value? – Manu Nov 12 '15 at 08:54
  • Use a HashMap type for this. Here you have an example of String to [Hashmap](http://stackoverflow.com/questions/10514473/string-to-hashmap-java) – xFunkyTImes Nov 12 '15 at 08:59
  • Thank you for your comments, and I already added the HashMap method in the question, but is there any function which can lower the repetition and I don't have to use HashMap for only one or two hundreds random numbers? – codingbunny Nov 16 '15 at 04:31
  • The repetitions start so soon because it's truly random. One criterion to distinguish "hand-made" random-sequences from truly random ones is that there are too few repetitions. – piet.t Nov 16 '15 at 07:34
  • *"But what I really want to know is why this generator generate repeat so soon"* This is what _random_ means. It is also possible that the first three numbers are equal ... – Tom Nov 16 '15 at 11:41

3 Answers3

3

By the birthday problem, odds of to have a duplicate among 80 random 4-digit decimal numbers are 27.1%, growing to 39.1% for 100 such random values, 50% for 118 such random values. Thus what's observed is not surprising.

These odds can be computed as:
  p0 = 0
  pi+1 = 1-(1-pi)*(k-i)/k
where k is the number of equaly probable possible values (here k=10000).

To generate distinct random-like numbers, we can

  • Encipher a counter with a constant (secret) key using an appropriate cipher, leveraging a Format Preserving Encryption technique. That allows to handle very large k with O(log(k)) memory, and effort for each generated ID growing as O(log(k)).
  • Generate a random permutation of the integers in range [0..k-1] using the Fisher-Yates shuffle (the IDs are then the first elements of the shuffled array); this is simpler to code for moderate k, but requires O(k log(k)) memory and initial effort (an elegant implementation is in another answer, search IDs in an array).
Community
  • 1
  • 1
fgrieu
  • 2,724
  • 1
  • 23
  • 53
2

when it comes around No.70 or 80, it got repeat string. Theoretically there will be 10*10*10*10 possibilities, why it repeat so soon, and what should I do to avoid repeat?

This is a variation of the birthday paradox. Repeats will occur more often than people imagine. Realize that to be unique, each new number needs to be different to every previous number. As the list of previous numbers grows, it will quickly become the case that at least one of the new numbers will match one of the old. With 10^4 possible numbers, there is a 50% chance of a repeat after just 118 randomly generated numbers.

The problem is compounded here by a small error in your code. The bound of Random.nextInt is exclusive, so your use of rdm.nextInt(9) will generate digits from only 0 to 8. That means you're generating only 9^4 possible numbers, and there is a 50% chance of a repeat after just 96 randomly generated numbers (quite close to what you observed).

Okay, some other minor things:

  • Rather than generating 4 random digits individually, it would be simpler to generate a number from 0 to 9999 and pad it to 4 digits wide. String.format can do the padding for you:

    return String.format("%04d", rdm.nextInt(10000));
    
  • Creating a new random number generator on every method call is inefficient. I suggest you create and use a static instance:

    private static final Random rdm = new Random();
    public static String generateLogID() {
        return String.format("%04d", rdm.nextInt(10000));
    }
    

    Or (a bit sloppy but probably fine), call Math.random() and hammer the result into an integer:

    public static String generateLogID() {
        return String.format("%04d", (int)(Math.random() * 10000));
    }
    

How to prevent repeats:

The current way you're using the map to store previous numbers is extremely inefficient. Each generated number calls containsValue, which has to do a slow search through every previous number. (A HashMap can search fast for entries by key but not by value. The way you're using it at the moment, it's effectively a heavyweight ArrayList.)

  • Using a hash-based structure to detect duplicates is fine, but make it a set, not a map. Like this:

    private static final Random random = new Random();
    private static final HashSet<Integer> previousIDs = new HashSet<>();
    public static synchronized String generateUniqueLogID() {
        if (previousIDs.size() == 10000) throw new RuntimeException("Out of IDs!");
        int id;
        do {
            id = random.nextInt(10000);
        } while (!previousIDs.add(id));
        return String.format("%04d", id);
    }
    

    Each call to this method will generate a new one of the possible 4-digit IDs, guaranteed not to repeat.

    Note: I added a check for exhaustion of the possible IDs, because otherwise that situation would cause the code to fall into an infinite loop. Even if you only need a few hundred IDs at the moment, a potential infinite loop is a ticking time bomb that could be a hard to find problem later.

    I also declared the method synchronized to protect the HashMap, if you want to use the method from multiple threads. (A Random is already thread-safe, so this wasn't needed before.)

  • Taking advantage of the small range of the numbers, a BitSet could also work here and would be more compact in memory than the HashSet:

    private static final Random random = new Random();
    private static final BitSet previousIDs = new BitSet();
    private static int remainingIDs = 10000;
    public static synchronized String generateUniqueLogID() {
        if (remainingIDs == 0) throw new RuntimeException("Out of IDs!");
        int id;
        do {
            id = random.nextInt(10000);
        } while (previousIDs.get(id));
        previousIDs.set(id);
        remainingIDs--;
        return String.format("%04d", id);
    }
    
  • Both the HashSet and BitSet would degrade in performance when the number of generated IDs grows close to the maximum. E.g., having generated 9999 IDs, it would expect to try 10000 times before discovering the last free ID.

    A nicer solution, again taking advantage of the small range of numbers, is to stuff all possible IDs in an array, and then to generate an ID, select one at random from those known to be remaining:

    private static final int[] ids = new int[10000];
    private static int remainingIDs = ids.length;
    static {
        for (int i = 0; i < ids.length; i++) ids[i] = i;
    }
    private static final Random random = new Random();
    public static synchronized String generateUniqueLogID() {
        if (remainingIDs == 0) throw new RuntimeException("Out of IDs!");
        int index = random.nextInt(remainingIDs);
        int id = ids[index];
        ids[index] = ids[--remainingIDs];
        return String.format("%04d", id);
    }   
    

    It might take a bit of thought to see how this works, but it does, and it's efficient too, because it doesn't need to search previous values to check for duplicates.

  • There is a clever beast called a linear feedback shift register. It can be used to generate a shuffled sequence of elements without using any memory to store them in their shuffled form. Or to put it another way, it can generate random numbers in a range, producing each number exactly once – no more, no less. The catch is that the "random" order is the same random order every time, and it is not a particularly high quality of randomness either. So it cannot be used to, e.g., shuffle a deck of cards, but it's useful in certain special cases, and it's extremely efficient:

    private static int lfsr = 1; // start register in non-zero state
    private static final int MASK = 0x2015; // this mask value will generate numbers from 1 to 16383; see http://users.ece.cmu.edu/~koopman/lfsr/ for more
    private static int remainingIDs = 10000;
    public static synchronized String generateUniqueLogID() {
        if (remainingIDs == 0) throw new RuntimeException("Out of IDs!");
        remainingIDs--;
        do {
            lfsr = ((lfsr & 1) != 0) ? ((lfsr >>> 1) ^ MASK) : (lfsr >>> 1);
        } while (lfsr > 10000); // loop to exclude the unwanted (overlarge) numbers
        int id = lfsr - 1; // subtract one so we generate 0 to 9999 rather than 1 to 10000
        return String.format("%04d", id);
    }
    
  • Final possibility: Do you really need the IDs in a random order? Maybe a silly question, but let's not overlook a really easy way of providing unique IDs:

    private static int nextID = 0;
    public static synchronized String generateUniqueLogID() {
        if (nextID == 10000) throw new RuntimeException("Out of IDs!");
        return String.format("%04d", nextID++);
    }
    
Boann
  • 48,794
  • 16
  • 117
  • 146
  • One problem with the LFSR technique as illustrated is that knowing a number generated and the mask 0x2015, the next number can be easily computed. Even without knowledge of that constant, the next number can be computed from a small sequence of previous numbers. This goes against the accepted definition of random in CS: computationally indistinguishable from truly random. – fgrieu Nov 16 '15 at 09:10
  • 1
    @fgrieu CS is broader than just cryptography. There are plenty of cases where pseudorandom numbers are fine. But you're right that if OP requires the numbers to be unguessable then LFSR is unsuitable. And in that case any use of `Random` would also be unsafe and would need to be replaced by `SecureRandom`. – Boann Nov 16 '15 at 09:20
  • Agreed, I should have written "..a minimal definition of random in cryptography..". Still, with the LFSR technique, any odd ID _N_ at least 1 is always followed by (_N_-1)/2, that's a very recognizable property! – fgrieu Nov 16 '15 at 09:44
  • Thank you very much! This is exactly what I want to ask, and thank you for the advice! – codingbunny Nov 19 '15 at 13:50
  • {ids[index] = ids[--remainingIDs];} Very tricky:) Just overwrite the current ID with one that has definitly not been used yet. If the index of the current ID should accidentially be the last one in the list than it will be filtered out next time anyway {random.nextInt(remainingIDs);} – grenix Sep 07 '16 at 11:55
  • just discovered that this is somehow the modern implementation of the Fisher-Yates shuffle fgrieu metionend below. Its like replacing some card in a deck with the bottom card. ... I'm still exalted ::)) – grenix Sep 08 '16 at 12:33
0

You can try built in UUID generators in java. Docs http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html Like this

UUID.randomUUID().toString()

It will give you a unique identifier every time you call it.

sohan nohemy
  • 615
  • 5
  • 13
  • You will then still have to map your UUID eg. to an integer number between 0 and 9999. On this way the uniqueness can get lost. – grenix Sep 07 '16 at 08:35