1

I am attempting to generate unqiue numeric ids to be used as primary keys in a Mysql database. I need to generate them outside of the database because of the distributed nature of the system.

Here is my attempt:

@RunWith(MockitoJUnitRunner.class)
public class AbstractEntityTest {

    @Test
    public void testGenerateUniqueId() {

        val ids = new HashSet<Long>();
        val duplicates = new HashSet<Long>();
        for (int i = 0; i < 1000000; i++) {
            val id = System.currentTimeMillis() +
                    (ThreadLocalRandom.current().nextLong(9999999) + ThreadLocalRandom.current().nextLong(9999999));
            if (!ids.add(id)) {
                duplicates.add(id);
            }
        }

        System.out.println("ids: " + ids.size());
        System.out.println("Duplicates: " + duplicates.size());
        assertThat(duplicates).isEmpty();
    }
}

The result is:

ids: 967265
Duplicates: 31939

Can anyone suggest a better way of generating a truely unique long in Java?


A solution appears to be:

@Test
    public void testGenerateUniqueId_withUUID() {

        val ids = new HashSet<Long>();
        val duplicates = new HashSet<Long>();
        for (int i = 0; i < 1000000; i++) {
            val id = UUID.randomUUID().getMostSignificantBits() & Long.MAX_VALUE;
            if (!ids.add(id)) {
                duplicates.add(id);
            }
        }

        System.out.println("ids: " + ids.size());
        System.out.println("Duplicates: " + duplicates.size());
        assertThat(duplicates).isEmpty();
    }

The result is:

ids: 1000000
Duplicates: 0

Here is 5 iterations of 30 million generations:

@Test
    public void testGenerateUniqueId_withUUID() {

        val iterations = 5;

        for (int j = 0; j < iterations; j++) {
            val ids = new HashSet<Long>();
            val duplicates = new HashSet<Long>();
            for (int i = 0; i < 30000000; i++) {
                val id = UUID.randomUUID().getMostSignificantBits() & Long.MAX_VALUE;
                if (!ids.add(id)) {
                    duplicates.add(id);
                }
            }

            System.out.println(String.format("Iteration %s of %s", j + 1, iterations));
            System.out.println("ids: " + ids.size());
            System.out.println("Duplicates: " + duplicates.size());
            assertThat(duplicates).isEmpty();
        }
    }

The result is:

Iteration 1 of 5
ids: 30000000
Duplicates: 0
Iteration 2 of 5
ids: 30000000
Duplicates: 0
Iteration 3 of 5
ids: 30000000
Duplicates: 0
Iteration 4 of 5
ids: 30000000
Duplicates: 0
Iteration 5 of 5
ids: 30000000
Duplicates: 0

I have added in the current time in milliseconds to assist with uniqueness outside of the running process.

@Test
    public void testGenerateUniqueId_withUUID_andCurrentTimeMilliseconds() {

        val iterations = 5;

        val duplicates = new HashSet<Long>();
        for (int j = 0; j < iterations; j++) {
            val ids = new HashSet<Long>();
            for (int i = 0; i < 30000000; i++) {
                val id = System.currentTimeMillis() + UUID.randomUUID().getMostSignificantBits() & Long.MAX_VALUE;
                if (!ids.add(id)) {
                    duplicates.add(id);
                }
            }

            System.out.println(String.format("Iteration %s of %s", j + 1, iterations));
            System.out.println("ids: " + ids.size());
            System.out.println("Duplicates: " + duplicates.size());
            assertThat(duplicates).isEmpty();
        }
    }
crmepham
  • 4,676
  • 19
  • 80
  • 155
  • 5
    "*I need to generate them outside of the database because of the distributed nature of the system*" If the system is distributed but they all share a database then the database is the place to generate the ID. MySQL can atomically generate sequential integer IDs. Having multiple distinct processes generate IDs is only going to lead to collisions. – Michael Feb 20 '21 at 14:40
  • @Michael Unfortunately the ORM layer does not play nicely with auto generated ids. What we are seeing is the same ids being allocated to multiple entities on the different services that persist the same object type, when using auto-increment ids. – crmepham Feb 20 '21 at 14:43
  • 2
    This is an XY problem. You should be asking "*how do I get ____ ORM framework to play nicely with auto generated ids*". I can't believe any ORM framework worth using would struggle with this incredibly basic use-case. It's more likely that you are doing it wrong. – Michael Feb 20 '21 at 14:45
  • 2
    Why are you generating **random** numbers? Is there a problem with a monotonically increasing sequence? I suppose randomness is adding itself to your current puzzle. Your question doesn't mention anything about the need for randomness. – ernest_k Feb 20 '21 at 14:52
  • @ernest_k For security reasons, ideally. But I think we could live with sequencial if we had to. But I suspect we would have similar issues with syncing the next available id across services? – crmepham Feb 20 '21 at 14:56
  • Another solution, if your application is truly distributed... Clustering (or distributed computing) tools like Hazelcast have distributed unique ID generators (check for example [Hazelcast's FlakeIdGenerator](https://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/flakeidgen/FlakeIdGenerator.html)) – ernest_k Feb 20 '21 at 14:59
  • Michael is right that this seems like an XY problem. An ORM framework is supposed to handle the lower-level details of database access, and generating unique IDs is a low-level detail of database access. – kaya3 Feb 20 '21 at 15:14
  • Found another suggestion [here](https://stackoverflow.com/a/75702619/4049154). – Amal May 10 '23 at 12:13

2 Answers2

2

Considering that UUID is almost unique you can use a UUID and convert it to a long

long uniqueNum = UUID.randomUUID().getMostSignificantBits() & Long.MAX_VALUE

If you want to be bullet free for uniqueness and you afford converting your id to string then you could just use

String uniqueId = UUID.randomUUID().toString();
Panagiotis Bougioukos
  • 15,955
  • 2
  • 30
  • 47
  • This seems to do the trick - No duplicates in 1000000 iterations. – crmepham Feb 20 '21 at 14:51
  • 1
    No, You can't really convert a UUID to a long this way and still keep the property "unique". – Johannes Kuhn Feb 20 '21 at 14:51
  • 2
    It would be worth discussing how much less unique this transformation would make the result. The logic that "unique input + some transformation = unique output" is spurious. What if the transformation was to replace every character with A? – Michael Feb 20 '21 at 14:51
  • @crmepham Starting with an integer variable initialised to 0 and adding 1 each time will *also* guarantee no duplicates in 1000000 iterations. It doesn't prove that it's sufficiently robust across multiple processes. – Michael Feb 20 '21 at 14:55
  • @Michael Yes but I need to guarantee that each service isn't assigning the next available number in sequence. – crmepham Feb 20 '21 at 14:58
  • @crmepham You missed the point. The point was that, clearly, if every process starts at zero and just counts up, they will certainly collide with one another. But within the context of a single process, which is all that you're testing, it would appear like a perfectly effective solution which guarantees no duplicates. What I am saying is not to assume that since you ran it 1million times and worked that it will be good enough. You could run the simple integer method 1million times and it might naively *appear* good enough even though it certainly isn't. – Michael Feb 20 '21 at 15:04
  • @Michael Right got it thanks. How can I increase the randomness of the number being generated? – crmepham Feb 20 '21 at 15:07
1

If possible, I suggest that you switch to UUID, which I think are better suited for your purpose. With UUID, you can just do:

UUID uuid = UUID.randomUUID();

Michael
  • 41,989
  • 11
  • 82
  • 128
  • If you cannot comment on questions, the correct thing to do is to build up enough reputation to be able to do so, by posting answers that *aren't* borderline comments. The rep cap for commenting is very low, it should not take long. – Michael Feb 20 '21 at 14:48