7

I need to generate unique id's for multiple sentences in a longer narrative (where multiple users can be performing the same action, at the same time, on different machines).

I considered doing new Date().getTime() (and perhaps concatenating a username) but as the id's were generated in a loop whilst iterating over the sentences, I found duplicates were created (as generation could occur at the same millisecond).

So I am currently playing around with:

var random1 = Math.floor((Math.random() * 10000) + 1).toString(36);
var random2 = Math.floor((Math.random() * 10000) + 1);
var random3 = Math.floor((Math.random() * 10000) + 1);
var id = random1 + random2 + random3;
// generates things like:  
// 1h754278042
// 58o83798349
// 3ls28055962

It occurred to me though (admittedly, as someone who has not pondered unique/random/crypto issues much), that perhaps joining three random numbers isn't any more random that one random number?

Is generating and concatenating 3 Math.random() values more random than 1 Math.random() value?

This answer (https://security.stackexchange.com/a/124003) states:

If the random generator really produces random data then it will not matter.

But I'm not sure how that applies to the usage of Math.random().

Edit:

Scenario is client side on web and not for security, just to ensure that each sentence has a unique id in the database.

Edit:

I ended up implementing:

function guid() {
  function s4() {
    return Math.floor((1 + Math.random()) * 0x10000)
      .toString(16)
      .substring(1);
  }
  return s4() + s4() + '-' + s4() + '-' + s4() + '-' +
    s4() + '-' + s4() + s4() + s4();
}

var id = guid();

From: https://stackoverflow.com/a/105074/1063287

Also see comment on that answer:

Actually, the RFC allows for UUIDs that are created from random numbers. You just have to twiddle a couple of bits to identify it as such. See section 4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers: rfc-archive.org/getrfc.php?rfc=4122

Community
  • 1
  • 1
user1063287
  • 10,265
  • 25
  • 122
  • 218
  • Does this run client side on the web and are you plannong on using it for security? – Liam Jul 06 '16 at 15:52
  • Yes client side on web, and not for security - just to ensure that each sentence has a unique id in the database. – user1063287 Jul 06 '16 at 15:55
  • 7
    @user1063287 `Math.random()` purpose isn't to provide uniqueness – A. Wolff Jul 06 '16 at 15:56
  • Because anyone could simply access the console and change this variable. Thus deliberatly causing a colision. Could this provide someone access to data they're not supposed to access? Tl;DR this sounds like a bad idea to me – Liam Jul 06 '16 at 15:58
  • 1
    It's not "more random", but it increases the id space and therefore decreases the risk of collisions, but you can't guarantee there'll be none with this. – raphv Jul 06 '16 at 16:00
  • 1
    [JS 4 now supports guid](http://stackoverflow.com/a/105074/542251) which are for uniquness – Liam Jul 06 '16 at 16:01
  • 2
    Possible duplicate of [Create GUID / UUID in JavaScript?](http://stackoverflow.com/questions/105034/create-guid-uuid-in-javascript) – Liam Jul 06 '16 at 16:02
  • i often use alpha num for unique id your first random should work fine if you have 8 char and more ...but yea dont forget to validate it isnt alrdy taken on the server side – Alex C Jul 06 '16 at 16:03
  • @Liam I tried the linked solution and it generates id's like `"9c783a6f-9838-e4d1-1998-8a29805ca2f8"`. If this is a simple question to answer - is it possible to make the id shorter, like `1h754278042`, or is the length of such generated id's required to ensure uniqueness? – user1063287 Jul 06 '16 at 16:21
  • To answer my question, it seems the format is required, after reading link in comment on http://stackoverflow.com/a/105074/1063287 : "Actually, the RFC allows for UUIDs that are created from random numbers. You just have to twiddle a couple of bits to identify it as such. See section 4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers: [rfc-archive.org/getrfc.php?rfc=4122](http://www.rfc-archive.org/getrfc.php?rfc=4122)" – user1063287 Jul 06 '16 at 16:29
  • Why not use web crypto api? – Regis Portalez Jul 07 '16 at 05:56
  • I really wonder what you mean by "*more random*" (or whether you know yourself). And against which "1 Math.random() value" are you comparing? `Math.floor(Math.random()*1e12).toString(36)`? – Bergi Jul 07 '16 at 06:10
  • @user1063287 the length is for uniueness. If you shorten the values then you shorten the number of possible values which increases the chances of a collosion. GUID are not guaranteed to by unique it's just virtually impossible to generate a non-unique value. Uniqueness is hard – Liam Jul 07 '16 at 07:53
  • If you came up with an answer yourself you should [answer your own question](http://stackoverflow.com/help/self-answer) – Liam Jul 07 '16 at 07:54
  • Though you appear to be saying this is a duplicate to the one that I tagged, in which case you should close it as such – Liam Jul 07 '16 at 07:54

3 Answers3

2

The only thing you change by concatenating 3 uniformly distributed random strings is a larger range of possible values. The distribution is still uniform, so it's no more "random" but it does reduce the risk of collisions significantly. The number of possible values would then by 36^12, or 4.7383813e+18.

You'd get the same effect by concatenating 12 base-36 digits (0-9,A-Z).

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Actually the algorithm the OP chose to concatenate the strings was not uniform, as he did not ensure a constant length… – Bergi Jul 07 '16 at 06:12
1

Math.random() returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy.

Here's V8's implementation:

uint32_t V8::Random() {

// Random number generator using George Marsaglia's MWC algorithm.
static uint32_t hi = 0;
static uint32_t lo = 0;

// Initialize seed using the system random(). If one of the seeds
// should ever become zero again, or if random() returns zero, we
// avoid getting stuck with zero bits in hi or lo by reinitializing
// them on demand.
if (hi == 0) hi = random();
if (lo == 0) lo = random();

// Mix the bits.
hi = 36969 * (hi & 0xFFFF) + (hi >> 16);
lo = 18273 * (lo & 0xFFFF) + (lo >> 16);
return (hi << 16) + (lo & 0xFFFF);
}

Source: http://dl.packetstormsecurity.net/papers/general/Google_Chrome_3.0_Beta_Math.random_vulnerability.pdf

In other words 3 random values ane not more 'random' than 1.

Sergei Podlipaev
  • 1,331
  • 1
  • 14
  • 34
  • That implementation was [fixed](http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html) last year. – sh1 Jul 07 '16 at 05:54
  • Please be specific and give a definition of "(more) random" if even you have to put it in quotes. – Bergi Jul 07 '16 at 06:11
0

It's complicated.

Because of the specific way that you're using the random number generator, it is possible for the the first string to be the same while the second or third string are different. This means that you produce more unique strings than you would get with just a single call to Math.random(). More unique strings implies fewer collisions, which is what you were aiming for.

To confirm this, simply dump a lot of these strings to a file and then sort them and see if the second and third values always change with the first value or if they can change independently (you'll need to add separators to the output to see that).

This is an artefact of the way PRNG state is hidden; and after a few appends it will stop working. It should (no guarantees!) be after so many iterations that you won't be able to test it easily, so don't try to work it out empirically.

If you had a really primitive generator algorithm then you might find that whenever random1 was equal to X, then random2 would always be equal to Y and random3 would always be equal to Z; so if you had a collision at X, then implicitly Y and Z would also collide and so they wouldn't help.

But most PRNGs (and also the structure of your code as you take only 10000 distinct values from each call), have a state much larger than what they reveal in a single call. This means that even if random1 is X, random2 and random3 are still completely unpredictable and the collisions are made less likely by their presence.

However, by the time you get to random100 you should start to see that you can guess what it's going to be based on the values of all the other randomXs and it won't be making the string any more unique.

Then the whole problem runs off down a rabbit hole of seed quality and state size. Basically, it's plausible for the random number generator to be so weak that it can only produce as few as four billion unique strings, and probably far fewer in realistic situations. The random() function wasn't intended to solve the GUID problem, so there's a risk that it will fail spectacularly at it.

sh1
  • 4,324
  • 17
  • 30