4

I need to calculate the daily number of unique users of an app.

The only way I can uniquely identify a user is via their UUID (this is externally supplied so I am forced to use it).

I know that my daily user counts are a couple of million users.

I'd like to use a bitset in Redis to do a population count, but in order for this to work, I'd need a way of narrowing my UUID so that it could comfortably fit into a long. I am aware of the potential for collisions but I am not concerned about precise numbers.

Has anyone done this in Java before? What I am after is how I could convert my UUID into something that could fit into a long.

seedhead
  • 3,655
  • 4
  • 32
  • 38

3 Answers3

3

There are two methods on the UUID object which might benefit you.

getLeastSignificantBits() and getMostSignificateBits(). Both return a long. Take one of these longs as your answer (or some kind of combination if you care.)

David Harkness
  • 35,992
  • 10
  • 112
  • 134
Starkey
  • 9,673
  • 6
  • 31
  • 51
  • 4
    I would combine the two values using `^` (xor) or something similar if you're using the built-in Java version that puts the time in one value and the node and clock sequence in the other. – David Harkness Jul 15 '12 at 22:54
  • Thanks for the suggestion, unfortunately I have discovered the long returned from getLeastSignificantBits() and getMostSignificantBits() is still too large a value to be used in Redis bitsets. – seedhead Jul 16 '12 at 12:10
  • Take the output from one of these and mask it to cut down the number of bits. How many bits are you allowed? – Starkey Jul 16 '12 at 13:45
3

you could generate a hash of your uuids that generates ints or longs and use those for your population count.

have a look a `redis.clients.util.MurmurHash' in the jedis redis library. you can find it at https://github.com/xetorthio/jedis

*edit: sample

        UUID uuid = UUID.randomUUID();
        ByteBuffer buf = ByteBuffer.allocate(16).putLong(uuid.getMostSignificantBits()).putLong(uuid.getLeastSignificantBits());
        buf.flip();
        int useMe= MurmurHash.hash(buf, 123);
Jonas Adler
  • 905
  • 5
  • 9
  • Thanks Jonas. I assume using something like MurmurHash has a risk (presumably low?) of collisions? I noticed in your example you put a seed value of 123. Is this a suitable value for hashing a UUID? – seedhead Aug 10 '12 at 10:07
  • Hi seedhead, sorry for the late answer but here it goes: Every hashing algo has the risk of collisions, but since you're doing a population count you should be fine. The redis lib uses '0x1234ABCD' as seed which should be fine – Jonas Adler Aug 21 '12 at 08:16
2

This is probably small enough to fit directly using the full UUID as a hash key. Approximations can also be made using less memory if that suites your needs.

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37