2

I am looking for an efficient way to generate 5 million unique codes with 7 characters (letters, numbers, special chars).

Basically, my idea was to generate a table with a unique constraint. Then to generate a code, insert it into the database, see if it is "accepted" (meaning a new code) until we have 5 million unique codes.

Alternatively they idea was to generate an array with 5 million unique codes to insert them afterward at once into the database to see how many of the codes make it into the database (are unique).

The third option was to create one code, check if it already exists, if not insert it into the database.

My question now is what method I should use - there might be a problem I oversee. Or is there a better way?

Thanks a lot!

Zeitounator
  • 38,476
  • 7
  • 53
  • 66
user1658080
  • 631
  • 1
  • 7
  • 18
  • you can use uuid - https://en.wikipedia.org/wiki/Universally_unique_identifier. see collision block. and just use it as an unique key in your db – myxaxa May 23 '19 at 08:42
  • 2
    @myxaxa UUIDs aren't 7 digits… – deceze May 23 '19 at 08:43
  • 4
    What prerequisites do the codes have to full fill? Because if there are none, just use 0000001 - 5000000. – Doktor OSwaldo May 23 '19 at 08:49
  • 1
    other solution : Create and check your codes by bunch of 1000 or 10000 or something like that before inserting them. Inserting them one by one will take age with database. Creating 5 millions will take some memory and if it crash before end you will have to restart all the job. Programming is made of compromise. – Shim-Sao May 23 '19 at 09:07
  • @DoktorOSwaldo - we want to use numbers, letters, special-chars; upper and lowercase. – user1658080 May 23 '19 at 10:19
  • What else is the requirement? Can they be *contiguous*? E.g. `0000001`, `0000002` etc, or must they be spread out randomly? – deceze May 23 '19 at 10:21
  • @deceze Randomly generated codes, using numbers, letters, special chars. Like we want to generate 5 million passwords with 7 characters; therefore contiguous numbers would be quite unsafe ;) – user1658080 May 23 '19 at 10:22
  • @user1658080 but why do they have to be unique then? Seems like a bad practice somewhere down the chain – Doktor OSwaldo May 23 '19 at 10:47
  • @DoktorOSwaldo because the codes are used as a kind of voucher-coupon and therefore need to be unique + hard to crack / guess. – user1658080 May 23 '19 at 10:52
  • It might be an idea to generate a dump file with php using `INSERT IGNORE` and import afterwards using mysql tool. Use a *unique* column for the random data. Don't check anything on php side, as it might consume too much memory when working with huge array. Insert a bit more as needed and afterwards delete where autoincrement id > 5m. An alternative to dump file can be to use PHP CLI directly (without dump) where at least the script runtime wouldn't be a problem. Interesting question :) – bobble bubble May 23 '19 at 11:25
  • Since you will use the IDs as coupon codes, do you expect users to enter those codes directly? Also, why seven characters and not more, if the purpose is to make those codes hard to guess? – Peter O. May 24 '19 at 08:32

4 Answers4

3

Pick an appropriate function to generate one random code; for illustration purposes I'll be using this:

function generateCode() {
    return substr(bin2hex(random_bytes(4)), 0, 7);
}

See https://stackoverflow.com/a/22829048/476 and other answers in there to pick something that works for you. The important point is that it uses a good source of randomness, either random_bytes, random_int, openssl_random_pseudo_bytes or /dev/urandom. This minimises the chance of two calls to this function producing the same output.

From there, simply use array keys to deduplicate the values:

$codes = [];

while (count($codes) < 5000000) {
    $codes[generateCode()] = null;
}

$codes = array_keys($codes);

If generateCode is sufficiently random, there should be few collisions and there shouldn't be too much overhead in generating codes this way. Even if, this is presumably a one-time operation, and efficiency isn't paramount. 5 million short strings should certainly fit into memory without much problem. You can then insert them all into the database in a batch.

deceze
  • 510,633
  • 85
  • 743
  • 889
3
function generateRandomString($length = 7) {

    // you can update these with new chars
    $characters = '!@#$%^&*()_+0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $charactersLength; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

Now use an array to store the codes:

$codes = array();
while(count($codes)!=5000000){
   $code =  generateRandomString();
   $codes[$code] =  $code;
}

$codes key and value, both have the same code.

Danyal Sandeelo
  • 12,196
  • 10
  • 47
  • 78
3

Given the purpose for which you're generating unique identifiers (as hard-to-guess coupon codes), I want to say that you should generate a unique identifier that combines a "unique" part and a "random" part.

  • The "unique" part can be a monotonically increasing counter (such as an auto-incremented row number in supporting databases), which can optionally serve as the seed of a full-period linear congruential generator (which cycles pseudorandomly through all possible values in its period before repeating).
  • The "random" part is simply a random number generated with a cryptographic random number generator (which for PHP is random_int). In general, the longer the random part is, the less predictable it will be.

Moreover, for the purposes of generating unique coupon codes, there is little reason to limit yourself to 7-character codes, especially if end users won't be required to enter those codes directly. See also this question.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
1

Should the codes you wanted needed to be inserted in the database?

It would have been better not to constantly request to the db and try if it is unique.

You can store the codes to an array, first before putting it to the db.

Pseudo-code:

  1. Generate unique 5 million codes, inserted in the hash table or an array. // as you insert a new one check the hash-table if it exists.

  2. You then insert this hash table or array in the database now.

CedricYao
  • 167
  • 1
  • 12