2

I have a unique field in a SQL table that currently has 200K rows

I use randomString(6, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') to insert data on that field, I have too many unique conflict error when I want to insert new rows

In my log , I see that randomString generated HEGDDX string today but it generated in 3 months ago too and I have an error on insert

'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' has 36 character , I generate 6 length random string, So there is 36^6=2176782336 = 2.17E9 possible cases, So 200K rows in 2 billion has 0.00009 possibility for duplication

Is 0.00009 big enough for too may errors? is Math.random bad random generator? what is alternative for me?

const randomString = function(length, chars) {
  let str = '';
  const charsLen = chars.length;

  for (let i = 0; i < length; i++) {
    str += chars.charAt(Math.floor(Math.random() * charsLen));
  }
  return str;
}
Pac0
  • 21,465
  • 8
  • 65
  • 74
WebMaster
  • 3,050
  • 4
  • 25
  • 77
  • 2
    avoid using custom random generators. use something like [nanoid](https://github.com/ai/nanoid) or [uuid](https://github.com/uuidjs/uuid) – Kharel Jul 21 '20 at 10:12
  • I dont what use uuid or .... , I just need some random string , it is coupon code my users fill it later in an input – WebMaster Jul 21 '20 at 10:15
  • so you want unique coupon code generated for each client which is 6 char length? – Kharel Jul 21 '20 at 10:17
  • 2
    google "birthday paradox". 2 Billion is not so much at the end. – Pac0 Jul 21 '20 at 10:17
  • I've run your code multiple times and created a `Set` from it. The `size` of the set was always around 10 less than 200k – adiga Jul 21 '20 at 10:23
  • @Pac0 How much is enought? – WebMaster Jul 21 '20 at 10:38
  • 1
    Probably a lot more. But since you have some requirements for having this small and user-friendly, I would suggest a retry-strategy (check for "non-unique errors", and retry up to 5 or 10 times), that would probably be enough. Also, you may want to remove coupons when they are used to avoid cluttering even more the space. You can also change strategy, use non-unique coupon, with some kind of backing fields "hasCoupon" and "isCouponUsed" to check if a particular user has and has ualready used his coupon. – Pac0 Jul 21 '20 at 10:41
  • Why are you even generating random identifiers in the application? Why not create a sequence in the database? – Bergi Jul 21 '20 at 12:00
  • 1
    Does this answer your question? [How can I generate a unique, small, random, and user-friendly key?](https://stackoverflow.com/questions/34708/how-can-i-generate-a-unique-small-random-and-user-friendly-key) Especially see [my answer](https://stackoverflow.com/questions/34708/how-can-i-generate-a-unique-small-random-and-user-friendly-key/62920385#62920385). – Peter O. Jul 21 '20 at 12:11

2 Answers2

2

At first sight, your implementation seems ok.

The JS builtin Math.random may not be "cryptography-random-safe", but it's fine for your use-case.

The problem lies in the math, it's counter-intuitive that with billions of possibilities you get collisions with a few hundred thousands. This "paradox" is closely related to the birthday paradox. For instance, this blog post is very close to your issue.

Potential solutions

Since it's supposed to be a user-friendly number, you clearly don't want to use a UUID / GUID or such.

Instead I suggest the following options:

  • Use a retry strategy. That may seem a poor hack to fix the problem, but I think it is appropriate in this case. Just catch the DB error for non-unique value on insert, and generate a new one. Limit the retries at 10 times or such, because you want to avoid infinite error loop if something else goes wrong. The caveat of this basic approach is that may lead to some rare slowness if your db itself is slow to respond, for those cases.

  • You could also just load all existing coupons already generated in memory before insert and do the generation-retry in the code instead of waiting an error from db. That would make you read all coupons first, so you'd better index them.

  • If performance is more critical, you could even use a mix of the two: load a global cache of all coupons at regular intervals that fits your case (every hour, every day, etc...), so you can first quickly check in this list, without doing a big query in db. But collision may still happen since some values may have been added in the meantime, so you check for errors and retry.

  • You could also change strategy and don't enforce uniqueness. You need to check this with your requirements, but you could add some fields or a "child" "coupon" table for users.

  • Get some ideas to generate those values in DB here: Generating a random & unique 8 character string using MySQL (it's for MySQL, but some ideas could apply for all DBs)

Pac0
  • 21,465
  • 8
  • 65
  • 74
0
  1. You can use uuid. I have used it in many applications of mine.
  2. If the rate of data coming is slow, not many requests are coming at the same time on the same server then you can also use the timestamp as an id too.
cipher94
  • 203
  • 1
  • 12