5

I'm working on a system that generates around 2 billion unique UUIDs a day. The UUIDs are generated using JavaScript \ Flash (AS3) on the client.

We recently noticed that our UUIDs are no where near unique. We have around 20% (!) of daily duplicates, most of which (relative to traffic volume) are coming from chrome.

I did some reading and learned that the pseudo-random generation (PRNG) algorithm implementation on most browsers, and specifically chrome, is flawed. Chromium and Node.js use the V8 javaScript engine, which implements an algorithm called MWC1616.

In theory, UUIDs generated using a good PRNG should have a 2132 probability for collision, but with MWC1616, under some very realistic scenarios, this probability is around 1:30000.

To solve the problem, I considered the following options:

  1. Generate the IDs on the server (using Go)
  2. Generate a stronger ID on the client, by hashing some info like IP, UA, timestamp, etc. with the UUID.
  3. Replacing Math.random() with a better random generator.

Since I prefer to keep things on the client and I don't want to re-invent the wheel and modify the UUID creation logic, I want to stick with option 3.

The good news are that on newer browsers, there is the getRandomValues api. Unfortunately, I need to support older browsers.

So my questions are:

  1. What is a good and reliable JavaScript polyfill for

crypto.getRandomValues()

(that doesn't use Math.random internally)?

  1. Does AS3 Math.random() use the browser's Math.random()? Does it implement the same algorithm itself?

  2. Does flash.crypto.generateRandomBytes() use Math.random()? Does it use crypto.getRandomValues()? If not, which algorithm does it implement and will it be a good solution for the same issue in AS3? If not, which AS3 crypto library would you recommend?

P.S. I highly recommend the articles I mentioned -1- -2- -3-. I was aware of the issues with Math.random() for many years, but this article truly made it clear to me for good.

Lizozom
  • 2,161
  • 2
  • 21
  • 38
  • 2
    AS3's Math.random() is browser independent but is still pseudo-random. AS3 has a UIDUtil.createUID() method but as stated in the docs "This UID will not be truly globally unique; but it is the best we can do without player support for UID generation.". I am using this class for the GUID generation and it works pretty well so far (with a few thousands generated IDs per day): http://snipplr.com/view/45247/as3-globally-unique-identifier-guid/ – Philarmon Mar 15 '17 at 12:53
  • @Philarmon - I suspect this won't be good enough for our load - The UIDUtil implementation is very similar to the implementation I currently have - using Math.random. Do you have any experience with crypto.generateRandomBytes? – Lizozom Mar 15 '17 at 13:02
  • 1
    Try `forge.random.getBytesSync(numbytes);` Seems forges's [random.js](https://github.com/digitalbazaar/forge/blob/master/lib/random.js) uses its own random generator when `window.crypto.getRandomValues()` is not available and can be forced using `forge.options.usePureJavaScript = true;` – pedrofb Mar 15 '17 at 13:08
  • Like I said - the UIDUtil is not good enough. But did you checked the other solution behind this link? It claims to generate globally unique ids and looks like there is no random stuff involved – Philarmon Mar 15 '17 at 20:48
  • *generateRandomBytes* : ``The random sequence is generated using cryptographically strong functions provided by the operating system. If the appropriate function is not available on an individual client computer or device, then an error is thrown.`` so it does not seems as a FLASH builtin process – payam_sbr Mar 16 '17 at 10:07
  • 1
    But you do realize that `Number` in JS and AS3 is IEEE-754 64 bit floating point? Obviously you can't have 2^128 unique values on 64 bits. And also I don't understand what "2^132 probability for collision" suppose to mean. The probability depends on how much bit you have available and how much ids you are generating. Important question is do you really need to have your ids to be generated randomly, especially if you have such large number of them? The only option to have true UUIDs in AS would be to use ByteArray to store them in actual 128 (an you would have to write your own RNG) – Paweł Audionysos Mar 16 '17 at 11:50
  • Can you say for sure, according to your experience, what exactly causes duplicates? Is generation of UUIDs by different clients at the same time makes this probability higher? I mean is it possible at all to have duplicates among UUIDs if they were all generated by the same client in different timestamps? Thanks. – berec Apr 17 '17 at 13:45
  • 1
    Thanks for that link to the v8 article about Math.random and xorshift128. Very enlightening! – broofa Jun 29 '17 at 01:04

1 Answers1

3

After spending more than a week researching this - my conclusion is: NEVER GENERATE UUIDs on the client. Just don't. Especially if you intend to scale.

For years, I knew that the browser's Math.random implementation was poor, but I didn't understand how bad was it, until we reached the scale of billions of events per day.

I decided to go with the easiest technical solution and moved UUID generation to the server. The percentage of duplicate IDs when down from ~25% a day to ~0.0008%.

P.S. Our server is implemented in Go. Node.js uses JavaScript V8 engine and might have the same issues. Though it seems like if you're using the latest Node.js, you should be ok.

Lizozom
  • 2,161
  • 2
  • 21
  • 38
  • 2
    I don't think it's necessary to go so far as to say, "never generate UUIDs on the client". There are a ton of benefits to creating a canonical ID at the same time you create a data entity which, in our increasingly mobile-centric world, happens more and more often on the client. And note, too, that you run into all the same issues generating IDs on the server as soon as you have to scale beyond a single server. The real takeaway here is to never assume your IDs are guaranteed to be unique. Put checks for this in place and fail gracefully when a collision does occur. – broofa Oct 29 '17 at 14:11