5

I'm working on a project, in which I should generate at least 70000 codes which contain 8 alphanumeric characters. The codes must be unique. currently I am using php to generate these codes with the following function :

function random_unique_serial($length, PDO $conn) {
    $codeCheck=FALSE;
    while (!$codeCheck) {
        $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
        $charactersLength = strlen($characters);
        $randomCode = '';
        for ($i = 0; $i < $length; $i++) {
            $randomCode .= $characters[rand(0, $charactersLength - 1)];
        }
        $sql = "SELECT * FROM codes WHERE code=:code";
        $st = $conn->prepare($sql);
        $st->bindvalue(":code", $randomCode, PDO::PARAM_STR);
        $st->execute();
        $count = $st->rowcount();
        if ($count==0) {
            $codeCheck=TRUE;
        } else {
            $codeCheck=FALSE;
        }
    }
    return $randomCode;
}

As you see this codes checks the database for every single code generated to make sure it is not a duplicate. This should work theoretically. However this is very slow and causes the request to time out. I tried increasing execution time but that also didn't help.

Then I decided to use a database side approach and used this solution : Generating a random & unique 8 character string using MySQL

This is also very slow and some of the generated codes are less than 8 characters long.

could you please suggest a better solution?

Community
  • 1
  • 1
MehdiB
  • 870
  • 12
  • 34

4 Answers4

7

Create your table structure:

CREATE TABLE t (code CHAR(8) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL UNIQUE);

Define a PHP function to generate a random string:

function random_string(integer $length = 8): string {
    return bin2hex(mcrypt_create_iv(ceil($length/2), MCRYPT_DEV_URANDOM));
}

Use PHP to build a multi-value INSERT statement, ram that into the database, count how many were inserted, and repeat until the required number are inserted:

function insert_records(\PDO $pdo, integer $need = 70000): void {
    $have = 0;
    while ($have < $need) {
        // generate multi value INSERT
        $sql = 'INSERT IGNORE INTO t VALUES ';
        for ($i = 1; $i < $need; $i++) {
            $sql .= sprintf('("%s"),', random_string());
        }
        $sql .= sprintf('("%s");', random_string());

        // pass to database and ask how many records were inserted
        $result = $pdo->query($sql);
        $count  = $result->rowCount();

        // adjust bookkeeping values so we know how many we have and how many
        // we need
        $need -= $count;
        $have += $count;
    }
}

On my machine (Amazon Linux c2.small), the run time for 70k records is about 2 seconds:

real    0m2.136s
user    0m1.256s
sys     0m0.212s

The relevant tricks in this code, to make it fast, are:

  • Sending the minimum number of SQL statements necessary to generate the needed number of records. Using a multi-value insert - INSERT INTO ... VALUES (), (), ... (); - really helps this as it minimizes the total amount of statement processing MySQL has to do and it tells us how many records were inserted without having to do another query.
  • Using INSERT IGNORE to avoid having to check for the existence of every single code we insert, which is really really expensive.
  • Using the fastest possible string generating function we can for our needs. In my experience, mcrypt_create_iv is a fast generator that is cryptographically secure, so it provides an ideal balance of security and performance.
  • Using the ASCII character set and fixed width CHAR to remove unnecessary byte overhead and UNIQUE to enforce de-duplication.
bishop
  • 37,830
  • 11
  • 104
  • 139
1

I'd do that with mysql alone, a stored procedure will help - you can still create and call that with php. The stored procedure uses the substring of a md5 hash, created from rand(). The column where the string is to be inserted needs to be unique. Replace table name and column in in this part:

insert ignore into foo (`uniqueString`)

delimiter //
create procedure createRandomString (in num int)
  begin
    declare i int default 0;
    while i < num do
      insert ignore into foo (`uniqueString`) values (substr(md5(rand()), 1, 8));
      set i = i + 1;
    end while;
  end //
delimiter ;

call createRandomString (70000);

I did a quick test, I got 69934 random unique strings inserted on a remote db (from the 70000 runs) within 10s 603ms. Running the same procedure with 80000 as parameter

call createRandomString(80000);

runs 12s 434ms for me, inserting 77354 rows - so you have at least 70000 in little time.

Will produce results like this:

enter image description here


If you want to make sure to have exactly the number of rows inserted as called, use this (but note to set the max_sp_recursion_depth to what it was before after calling the procedure, default is 0):

delimiter //
create procedure createRandomString2 (in num int)
  begin
    declare i int default 0;
    while i < num do
      insert ignore into foo (uniqueString) values (substr(md5(rand()), 1, 8));
      set i = i + 1;
    end while;
    if (select count(id) from foo) < num then
      call createRandomString2(num - (select count(id) from foo));
   END IF;
  end //
delimiter ;

set max_sp_recursion_depth = 100; 
call createRandomString7 (70000);
set max_sp_recursion_depth = 0;
baao
  • 71,625
  • 17
  • 143
  • 203
0

Here's one idea...

Here I'm inserting (approx.) 16, unique, 3-character (0-9/a-z) strings...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table (my_string CHAR(3) NOT NULL PRIMARY KEY);

INSERT INTO my_table 
SELECT CONCAT(SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ) x;


//Repeat this block as necessary

INSERT IGNORE INTO my_table 
SELECT CONCAT(SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
             ) x
          FROM my_table;

//End of block


SELECT * FROM my_table;
+-----------+
| my_string |
+-----------+
| 0he       |
| 112       |
| 24c       |
| 322       |
| 4b7       |
| 7vq       |
| as7       |
| g7n       |
| h66       |
| i54       |
| idd       |
| m62       |
| mqt       |
| obh       |
| x75       |
| xz4       |
+-----------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
0

Eight digit numbers are guaranteed unique: 00000000, 00000001, 00000002, ... If you don't want the codes so obvious, then select eight different sets of ten alphanumeric characters to replace the ten digits in a given position. There will still be a pattern, but it will be less obvious: ql4id78sk, ql4id78s3, ql4id78sa, ...

Beyond that, you could encrypt the original numbers, and the encryptions are guaranteed unique. A 32 bit block cypher will produce four byte results, giving eight hex characters.

rossum
  • 15,344
  • 1
  • 24
  • 38