1

I'd like to start off by saying this is a little more of a general question; not one pertaining to the specific examples that I have given, but simply a conceptual topic.

Example #1: I'm creating a truly random string with UUID.java. Let's say I never want to have the same UUID generated, ever. Here's an idea of the circumstance: (Let's assume that I'm saving/loading the List at the top- that's not the point)

Gist URL (I'm new to StackExchange- sorry!)

import java.util.ArrayList;
import java.util.List;
import java.util.UUID;

public class Example {

    /**
     * A final List<String> of all previous UUIDs generated with
     * generateUniqueID(), turned into a string with uuid.toString();
     */
    private static final List<String> PREVIOUS = new ArrayList<String>();

    /**
     * Generates a truly unique UUID.
     * 
     * @param previous
     *            A List<String> of previous UUIDs, converted into a string with
     *            uuid.toString();
     * @return a UUID generated with UUID.randomUUID(); that is not included in
     *         the given List<String>.
     */
    public static UUID generateUniqueID(List<String> previous) {
        UUID u = UUID.randomUUID();
        if (previous.contains(u.toString())) {
            return generateUniqueID(previous);
        }
        return u;
    }

    /**
     * Generates a truly unique UUID using the final List<String> PREVIOUS
     * variable defined at the top of the class.
     * 
     * @return A truly random UUID created with generateUniqueID(List<String>
     *         previous);
     */
    public static UUID generateUniqueID() {
        UUID u = generateUniqueID(PREVIOUS);
        PREVIOUS.add(u.toString());
        return u;
    }

}

Example #2: Okay, maybe UUID was a bad example, so let's use Random and a double. Here's another example:

Gist URL

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class Example2 {

    /**
     * A final List<Double> of all previous double generated with
     * generateUniqueDouble(), turned into a string with Double.valueOf(d);
     */
    private static final List<Double> PREVIOUS = new ArrayList<Double>();

    /**
     * The RANDOM variable used in the class.
     */
    private static final Random RANDOM = new Random();

    /**
     * Generates a truly unique double.
     * 
     * @param previous
     *            A List<Double> of previous doubles, converted into a Double
     *            with Double.valueOf(d);
     * @return a UUID generated with UUID.randomUUID(); that is not included in
     *         the given List<Double>.
     */
    public static double generateUniqueDouble(List<Double> previous) {
        double d = RANDOM.nextDouble();
        if (previous.contains(Double.valueOf(d))) {
            return generateUniqueDouble(previous);
        }
        return d;
    }

    /**
     * Generates a truly unique double using the final List<Double> PREVIOUS
     * variable defined at the top of the class.
     * 
     * @return A truly random double created with generateUnique(List<Double>
     *         previous);
     */
    public static double generateUnique() {
        double d = RANDOM.nextDouble();
        PREVIOUS.add(Double.valueOf(d));
        return d;
    }

}

The point: Is this the most efficient method of doing something like this? Keep in mind I gave you examples, so they're pretty vague. Preferrably I wouldn't like to use any libraries for this, but if they really are a substantial difference in efficency please let me know about them.

Please let me know what you think in the responses :)

Justin J. O'Boyle
  • 95
  • 1
  • 1
  • 10
  • How many things are you going to be generating? – Aify Apr 13 '15 at 22:03
  • @Aify I will be most likely generating quite a lot actually. – Justin J. O'Boyle Apr 13 '15 at 22:07
  • @Aify Sorry for the vague answer, didn't realize enter was send. I'm using it for generating one-time keys when a user downloads a package from a specified server. – Justin J. O'Boyle Apr 13 '15 at 22:08
  • The initial problem that come to mind is that you're going to run out of memory eventually since you're storing everything in a list - especially if you're going to be generating a lot – Aify Apr 13 '15 at 22:08
  • Performance-wise, you are better of using something like a HashSet to store the previous values. The reason: the contains method of HashSet is constant time. This is opposed to a List which operates in linear time. – copeg Apr 13 '15 at 22:09
  • 2
    If you generate a lot of these, and they must never collide, why not make them sequential? – Andreas Apr 13 '15 at 22:09
  • @copeg Hmm, okay, that makes sense now thinking about it. Now, more back to the original question, let's say I wasn't generating very many- would the way I'm using recursion be efficient? – Justin J. O'Boyle Apr 13 '15 at 22:10
  • Andreas brings up a very good point. If it's sequential, you don't even need to keep track of the already used values - you just need to keep track of the last used value, and increment it, which is much faster than checking for duplicates. – Aify Apr 13 '15 at 22:11
  • Recursion is (basically) never efficient - use recursion to find a quick solution to a problem, then try and implement a non-recursive version of it. Anything you can do with recursion, you can do without recursion. – Aify Apr 13 '15 at 22:12
  • @Andreas I see your point completely- however I'm talking about a more general stance here. While that would _for sure_ be more efficient, I'd just like to know, if recursion like this would be too excessive for generating, say 100 random numbers at once, is this the most efficent way or is there something blatanly obvious that I'm missing? – Justin J. O'Boyle Apr 13 '15 at 22:17
  • The odds of the recursion going more than one or two levels are virtually nil. I wouldn't worry about the efficiency of the recursion nearly so much as the efficiency of the ArrayList. – pjs Apr 13 '15 at 22:18
  • @JustinJ.O'Boyle, instead of attempting to detect collisions (which would involve caching every ID generated), consider utilizing enough entropy that a collision is inconsequentially small. If you collect 160 bits from java.util.Random the chances of a collision after generating 10^17th IDs is 1 in 100 trillion. – Andreas Apr 13 '15 at 22:25
  • @Andreas Yeah, definitely worth considering. Thanks for helping me out, and everything's a lot more cleared up now! :) – Justin J. O'Boyle Apr 13 '15 at 22:27

3 Answers3

6

I suggest you make the generated IDs sequential numbers instead of doubles or uuids. If you want them to appear random to end users, display the sha1 of the number in base64.

Andreas
  • 4,937
  • 2
  • 25
  • 35
  • Thanks for your answer, this is most likely what I'll be using instead of just displaying sequential numbers. Just a little question- is it "off topic" to ask for the most efficient method of displaying the number in the method you explained? Or, should I do a little more research and create another question? – Justin J. O'Boyle Apr 13 '15 at 22:22
  • That's another question. It's fine to have follow-up questions, but if they're different questions make a separate post. – pjs Apr 13 '15 at 22:23
  • This is a good idea. Not sure whether it meets requirements; depends on whether it matters that future values can be guessed. – mattm Apr 13 '15 at 22:30
  • I had to change the recommended answer to Marco13's, not because I didn't like yours but because he provided a _lot_ of examples, thanks for your help though! – Justin J. O'Boyle Apr 13 '15 at 23:14
  • I would upvote your answer, but I'm new to the site and don't have enough reputation points, sorry! – Justin J. O'Boyle Apr 13 '15 at 23:20
1

Some points have already been discussed in the comments. To summarize and elaborate them here:

It is very unlikely that you create the same double value twice. There are roughly 7*1012 different double values (assuming that the random number generator can deliver "most" of them). For the UUIDs, the chance of creating the same value twice is even lower, since there are 2122 different UUIDs. If you created enough elements to have a non-negligible chance for a collision, you'd run out of memory anyhow.

So this approach does not make sense in practice.


However, from a purely theoretical point of view:

Performance

Using a List for this operation is not optimal. The "best case" (and by far the most common case) for you is that the new element is not contained in the list. But for the check whether the element is contained, this is the worst case: You'll have to check each and every element of the list, only to detect that the new element was not yet present. This is said to be linear complexity, or for short, O(n). You could use a different data structure where checking whether an element is contained can be done more quickly, namely in O(1). For example, you could replace the line

private static final List<Double> PREVIOUS = new ArrayList<Double>();

with

private static final Set<Double> PREVIOUS = new HashSet<Double>();

Performance and Correctness

(referring to the recursive approach in general here)

Performance

From a performance point of view, you should not use recursion when it can easily be replaced by an iterative solution. In this case, this would be trivial:

public static double generateUniqueDouble(List<Double> previous) {
    double d = RANDOM.nextDouble();
    while (previous.contains(d)) {
        d = RANDOM.nextDouble();
    }
    PREVIOUS.add(d);
    return d;
}

(it could be written a bit more compact, but that does not matter now).

Correctness

This is more subtle: When there are many recursive calls, then you might end up with a StackOverflowError. So you should never use recursion unless you can prove that the recursion will end (or better: That it will end "after a few steps").

But here's your main problem:

The algorithm is flawed. You cannot prove that it will be able to create a new random number. The chance that even a single new element is already contained in the collection of PREVIOUS elements is ridiculously low for double (or UUID) values. But it is not zero. And there is nothing preventing the random number generator from creating the random number 0.5 indefinitely, trillions of times in a row.


(Again: These are purely theoretical considerations. But not as far away from practice as they might look at the first glance: If you did not create random double values, but random byte values, then, after 256 calls, there would be no "new" values to return - and you would actually receive the StackOverflowError...)

Community
  • 1
  • 1
Marco13
  • 53,703
  • 9
  • 80
  • 159
  • Thank you for the extremely detailed comment, helped me a lot and I'm sure it will help others! I can't believe I forgot a while loop. However, I have one small question. This line: `while (previous.contains(d))` is obviously running the loop when the condition is true, however, don't we need the opposite, `while (!previous.contains(d))`? Anyway, thanks again for your expansive answer and I will revise my strategies :) – Justin J. O'Boyle Apr 13 '15 at 23:18
  • Maybe I'm lockng coffee, but I think `while (previous.contains(d))` should be correct: As long as the generated number is contained in the set, a new one should be created. – Marco13 Apr 14 '15 at 08:29
  • Ahh, my bad mate, just reread it. Was really tired last night, my apologies – Justin J. O'Boyle Apr 14 '15 at 21:01
0

It would be better to use a hash table than a list. Generate your candidate value, check for a collision in the hash table, and accept it if there is no collision. If you use a list, generating a new value is an O(n) operation. If you use a hash table, generating a new value is an O(1) operation .

mattm
  • 5,851
  • 11
  • 47
  • 77
  • Hmm, I will most certainly look more into hash tables for this. However, when you say "accept it if there's no collision", what happens if it _is_ a collision? Should I use recursion like I did in the examples? – Justin J. O'Boyle Apr 13 '15 at 22:17
  • If there's a collision then you generate a new random value and try again. Doing so with a loop would be more efficient than recursion, but honestly I doubt you'd see the difference if you did timings. – pjs Apr 13 '15 at 22:21