1

This is a follow-up question to the one I asked previously, Is this JS Unique ID Generator Unreliable? (Getting collisions).

In the scriptlet below I'm generating 10000 random numbers using 2 methods. Method 1 is a straight random number up to 10^6, while Method 2 concatenates that random number up to 10^6 (same idea as in [1]) with the current JS Date().time() timestamp. Also there's Method [3] which only does the Math.round on the RNG rather than the whole concatenated result.

My question is, if you keep clicking the test button, you see that [1] always produces 10000 unique numbers but [2] produces ~9500 no matter what. [3] produces ~9900 but never the max either. Why is that? The chances of getting a +/-1 from a previous random number in [0..10^6] and having that mixed with the timestamp of exactly the opposite +/-1 for the timestamp concatenation are impossible. We are generating pretty much on the same millisecond in a loop. 10^6 is a huge limit, much bigger than in my original question, and we know that's true because Method [1] works perfectly.

Is there truncation of some kind of going on, which trims the string and makes it more likely to get duplicates? Paradoxically, a smaller string works better than a larger string using the same RNG inside it. But if there's no truncation, I would expect results to be 100% as in [1].

function run() {
var nums1 = new Set(), nums2 = new Set(), nums3 = new Set();

for (var i = 0; i < 10000; i++) {
   nums1.add(random10to6th());
}
for (var i = 0; i < 10000; i++) {
   nums2.add(random10to6th_concatToTimestamp());
}
for (var i = 0; i < 10000; i++) {
   nums3.add(random10to6th_concatToTimestamp_roundRNGOnly());
}
console.clear();
console.log('Random 10^6 Unique set: ' + nums1.size);
console.log('Random 10^6 and Concat to Date().time() Unique set: ' + nums2.size);
console.log('Random 10^6 and Concat to Date().time(), Round RNG Only Unique set: ' + nums3.size);

function random10to6th() {
   return Math.random() * Math.pow(10, 6);
}

function random10to6th_concatToTimestamp() {
   return Math.round(new Date().getTime() + '' + (Math.random() * Math.pow(10, 6)));
}
}

function random10to6th_concatToTimestamp_roundRNGOnly() {
   return new Date().getTime() + '' + Math.round(Math.random() * Math.pow(10, 6));
}
<button onclick="run()">Run Algorithms</button>  
<p>(Keep clicking this button)</p>
JLRishe
  • 99,490
  • 19
  • 131
  • 169
gene b.
  • 10,512
  • 21
  • 115
  • 227
  • You can store the number as string as thousands (groups of three elements) up the the maximum `.length` of an array see [How do I add 1 to a big integer represented as a string in JavaScript?](https://stackoverflow.com/questions/43614407/how-do-i-add-1-to-a-big-integer-represented-as-a-string-in-javascript) – guest271314 Jan 19 '18 at 19:16
  • `random10to6th_concatToTimestamp_roundRNGOnly()` returns strings instead of numbers which are not affected by precision issues. – le_m Jan 19 '18 at 19:19
  • @le_m `Math.round(Math.random() * Math.pow(10, 6))` does not affect precision issues? – guest271314 Jan 19 '18 at 19:24
  • `Math.round(Math.random() * Math.pow(10, 6))` is more likely to have collisions than `Math.random() * Math.pow(10, 6)` because rounding limits the possible set of values to 10^6 possible values. – JLRishe Jan 19 '18 at 19:26
  • If the concern is collisions for random values you can include alphabetic characters within the resulting string; i.e.g., you can create and revoke `Blob URL`s N times. See also [How would one generate a MAC address in Javascript?](https://stackoverflow.com/questions/24621721/how-would-one-generate-a-mac-address-in-javascript/) – guest271314 Jan 19 '18 at 19:30
  • @guest271314 I should have said "...not affected by precision issues caused by limited integer precision". – le_m Jan 19 '18 at 19:33
  • Another approach to generate unique random values that cannot collide is to randomize the output of N-M, see [Random number, which is not equal to the previous number](https://stackoverflow.com/questions/40056297/random-number-which-is-not-equal-to-the-previous-number/) – guest271314 Jan 19 '18 at 19:38

2 Answers2

3

Is there truncation of some kind of going on, which trims the string and makes it more likely to get duplicates?

Yes, simply by rounding a random number, you cut off the fractional digits. This reduces the number of possible outcomes compared to the non-rounded random number.

In addition to that, you concatenate a timestamp (13 digits) with a value between 0 and 1000000 (1 to 7 digits). So your concatenated result will have a total number of 14 to 20 digits, but JavaScript's number datatype is of limited precision and represents integers faithfully up to about 16 digits only (see Number.MAX_SAFE_INTEGER).

Example: Let's assume the timestamp is 1516388144210 and you append random numbers from 500000 to 500400:

+'1516388144210500000' == 1516388144210500000
+'1516388144210500100' == 1516388144210500000
+'1516388144210500200' == 1516388144210500000
+'1516388144210500300' == 1516388144210500400
+'1516388144210500400' == 1516388144210500400

You can see that, when converting those strings to numbers, they get rounded to the nearest available IEEE-754 double-precision (64 bit) number. This is because 1516388144210500000 > Number.MAX_SAFE_INTEGER.

le_m
  • 19,302
  • 9
  • 64
  • 74
  • but the thing is, I'm concatenating into a string, right? I'm doing `+ '' +` so I'm not dealing with numbers anymore. – gene b. Jan 19 '18 at 19:19
  • Well, no, your second function at least - the one with the issue - returns `Math.round(...)` which converts the argument into a number. – le_m Jan 19 '18 at 19:21
  • Got it. I added a 3rd one later which is string-based. But that one still isn't 100% either. – gene b. Jan 19 '18 at 19:22
  • 1
    Yes, because you round the random number, which is in fact a truncation of many random digits compared to the non-rounded random number (first function). – le_m Jan 19 '18 at 19:23
0

I think there are a number of issues in play here. I don't know which or to what degree each of the items below contribute to the observed difference, just that they are things that might explain the results.

One is because you're concatenating a number with a string with a number and then coercing the value back to a number as part of rounding the result. It would be very easy to feed unexpected results into the Round function (which in itself might cause collisions due to floating precision and such outlined below)

Second, I think that you actually reduce the randomness of the resulting number when you concatenate the timestamp. The function is likely to be called many, many times every second; if it's invoked at a rate > Date.getTime() precision the value returned will be identical to one generated in a previous loop iteration.

Third, unless I missed something, have you considered that the random number gen is only guaranteed to be pseudo-random? Precision and digit limits play a factor when dealing with large values like in the code you posted. Since the random-est part of the number is tacked onto the least significant part, it is more likely to be truncated, chopped, or modified.

Try inverting your concatenation and see the results (there's only about 4 or so collisions). The collisions are accounted for by the reasons outlined by me and @le_m's answers.

function run() {
var nums1 = new Set(), nums2 = new Set()

for (var i = 0; i < 10000; i++) {
   nums1.add(random10to6th());
}
for (var i = 0; i < 10000; i++) {
   nums2.add(random10to6th_concatToTimestamp());
}
console.clear();
console.log('Random 10^6 Unique set: ' + nums1.size);
console.log('Random 10^6 and Concat to Date().time() Unique set: ' + nums2.size);

function random10to6th() {
   return Math.random() * Math.pow(10, 6);
}

function random10to6th_concatToTimestamp() {
   return Math.round((Math.random() * Math.pow(10, 6)) + '' + new Date().getTime());
}
}
<button onclick="run()">Run Algorithms</button>  
<p>(Keep clicking this button)</p>
Josh E
  • 7,390
  • 2
  • 32
  • 44