Generating long unique id with Murmur3 from google guava

Question

at the moment i am trying to generate unique identifiers of type long on the client side. I have a parent/child relationship where the parent already has a UUID as identifier. I want to consider the Parent-UUID for calculating a Child-Id of type long.

I have this implementation at the moment:

public static void main(String[] args) {

    /** Funnel. */
    final Funnel<UUID> UUID_FUNNEL = new Funnel<UUID>() {
        @Override
        public void funnel(UUID parentUUID, PrimitiveSink into) {
            final UUID tmpId = UUID.randomUUID();
            into
             // consider parent uuid
            .putLong(parentUUID.getMostSignificantBits())
            .putLong(parentUUID.getLeastSignificantBits())
             // consider tmp uuid
            .putLong(tmpId.getMostSignificantBits())
            .putLong(tmpId.getLeastSignificantBits());
        }
    };

    final UUID parentUUID = UUID.randomUUID();
    System.out.println(parentUUID.toString());

    for (int i = 0; i < 1000; i++) {
        final long childId = Hashing.murmur3_128().newHasher()
                                .putObject(parentUUID, UUID_FUNNEL)
                                .hash().asLong();
        System.out.println(childId);
    }
}

What do you think about this idea? Any suggestions are welcome.

I have already read this Question: How to generate unique Long using UUID

Why would this generate unique child ids? Wouldn't an `AtomicLong` with `getAndIncrement()` be simpler? — mfulton26, Jan 12 '17 at 13:18
A single `AtomicLong` can be used to generate a sequence of unique long values in a multi-threaded process. — mfulton26, Jan 12 '17 at 13:33
The clients should not go to a server to get the next id. It must be possible to work offline. — Julian Aliaj, Jan 12 '17 at 13:39
Oh, I see. Do you you need the child IDs to be unique across all children of all parents or only among its siblings? — mfulton26, Jan 12 '17 at 13:41
You eventually need to persist the child ID before sharing it, right? So why not let the server mint the child IDs? I guess I'm challenging the need for the problem. Creating a unique long across multiple processes, IMO, is risky as there is simply to high of a chance for a collision. — mfulton26, Jan 12 '17 at 13:57
Hashing a random number is no better than generating a random number. — Louis Wasserman, Jan 12 '17 at 16:50

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

This won't really work. Surely no better than a random long.

Without tmpId: You only hash the parentUUID, so all children of the same parent get the same long.
With tmpId: You could use UUID.randomUUID().getLeastSignificantBits() or just random.nextLong() and save yourself all the work (hashing a random value leads to a random result, no matter what you add).

I have multple clients not only one.

Then ask a unique server. This includes some overhead which can be easily minimized using a hi-lo algorithm.

At the DB level the child id's must be unique.

Then forget it and let the DB generate the id. Every DB has some AUTOINCREMENT or SEQUENCE meant exactly for this.

In case you need the id in the client before you access the DB, ask the DB (and use the hi-lo algorithm in order to minimize the overhead).

Offline working

I just saw your comment:

The clients should not go to a server to get the next id. It must be possible to work offline.

This is a big pain. Any hashing you could do won't be better than a random long.

A set of one million random longs has a collision chance of about 1e-6, which might be acceptable. Note that due to the birthday paradox, the chance grows quadratically with the set size.
You could try to handle the offline created entities without an ID (using some other identifier), but this sounds like a big pain.
You could preallocate some IDs for each client. This sounds wasteful, but preallocating 100 ID for each of one million clients uses up less than 5% of all possible IDs.
You could switch to random UUIDs. Because of them being 128 bits long, the collision chance is practically zero even for billions of ID.

How are you getting that probability and growth rate? It doesn't sound quite right. — pvg, Jan 14 '17 at 06:28
@pvg Very roughly: `long` is `2**64`, i.e., about `1e18`, having `1e6` entries means about `1e12` pairs which implies a collision probability of about `1e-6`. This all is one or two orders of magnitude off, but IMHO good enough at this stage. For `1e7` entries the probability would be about `1e-4`. For `1e10` entries, I'd get a probability of `100`, which is a clear non-sense, but it's also too far from the area of usable sizes. Or do you see a concrete problem? — maaartinus, Jan 14 '17 at 20:11

Generating long unique id with Murmur3 from google guava

1 Answers1

Offline working