0

I need to generate unique filenames for uploaded files. I store the names in a database and when generating a filename check to make sure it's unique. I know there are a lot of questions on this subject on here, but what I'm trying to understand is why my script isn't working.

Her's my code for getting a filename and checking that it is unique:

do {
    $newName  = generateRandomString(10, '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ');

    $stmt = $this->db->prepare('SELECT id FROM images WHERE file_name = :newName');
    makeQuery($stmt, array(':newName' => $newName));
    $row = $stmt->fetch(\PDO::FETCH_ASSOC);
} while(!empty($row));

Where generateRandomString() is:

function generateRandomString($length, $characters) {
    $randomString = '';

    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, strlen($characters) - 1)];
    }

    return $randomString;
}

Now, when I run this with about 30,000 filenames in my database, it takes anywhere from a few seconds to literally a few minutes to return a filename.

With as many characters as I'm using in the filenames (0-9a-zA-Z) and with a length of 10, there should be a HUGE number of potential filenames (about 107 billion if I calculated it right). It doesn't seem like there should be any collisions at all, least of all the number I'm getting (an XDebug profile snapshot I analyzed said generateRandomString() ran over 100,000 times before returning!).

Why isn't this working and what can I do to fix it?

EDIT: Oops, I misinterpreted the xdebug data. It didn't take 100,000 function calls, it took 123,502 milliseconds (so time, not function calls).

Nate
  • 26,164
  • 34
  • 130
  • 214

2 Answers2

2

There are many problems with this code:

  • SELECT, then INSERT is prone to a race condition (between the two statements another process inserted the same ID). The clean way is to optimisticly insert a row, and retry on duplicate key errors, better still use a deterministicly unique function.
  • You prepare a new statement for every loop. The clean way is to prepare the statement once, then ececute it repeatedly with the different parameters. This is why they are called prepared statements
  • Your implementation uses PHP's rand() function, which produces wildly different qualities of randomness depending on PHP Version and OS. Use mt_rand();

I recommend you create the identifier inside the DB: See my answer to another SO question

Community
  • 1
  • 1
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • I changed `rand()` to `mt_rand()` and after several thousand uploads not a single duplicate file name! Thank you for your other suggestions as well. – Nate Jul 25 '14 at 00:28
0

I think I got the solution for you:

http://kvz.io/blog/2009/06/10/create-short-ids-with-php-like-youtube-or-tinyurl/

All the IDs generated by this code will be Unique

Takoyaro
  • 928
  • 7
  • 14
  • 2
    The linked resource may be useful for the author, but (1) it's not really what the OP is asking about; and (2) you should include the solution in your answer, linking to the external resource. – Aleks G Jul 24 '14 at 18:05