1

I'm trying to create a URL similar to youtube's /v=xxx in look and in behavior. In short, users will upload files and be able to access them through that URL. This URL code needs to be some form of the database's primary key so the page can gather the data needed. I'm new to databases and this is more of a database problem than anything.

In my database I have a auto increment primary key which file data is accessed by. I want to use that number to to create the URL for files. I started looking into different hash functions, but I'm worried about collisions. I don't want the same URL for two different files.

I also considered using uniqid() as my primary key CHAR(13), and just use that directly. But with this I'm worried about efficiency. Also looking around I can't seem to find much about it so it's probably a strange idea. Not to mention I would need to test for collisions when ids are generated which can be inefficient. Auto increment is a lot easier.

Is there any good solution to this? Will either of my ideas work? How can I generate a unique URL from an auto incremented primary key and avoid collisions?

I'm leaning toward my second idea, it won't be greatly efficient, but the largest performance drawbacks are caused when things need to be added to the database (testing for collisions), which for the end user, only happens once. The other performance drawback will probably be in the actual looking of of chars instead of ints. But I'm mainly worried that it's bad practice.

EDIT:

A simple solution would to be just to use the auto incremented value directly. Call me picky, but that looks kind of ugly.

SpaceFace
  • 1,492
  • 3
  • 15
  • 29
  • How many records are you anticipating you will have? – sberry Aug 16 '12 at 16:50
  • 1
    Why do you think using the auto-incremented value is ugly? – Briguy37 Aug 16 '12 at 16:52
  • @sberry That really depends, not many from the start. But I guess it's worth noting files will be removed after 2 months. So being optimistic no more than a few thousand at a time. – SpaceFace Aug 16 '12 at 16:54
  • @Briguy37 I meant for the URL it's ugly, `www.example.com/id=123` just seems strange to me, that's all – SpaceFace Aug 16 '12 at 16:55
  • 1
    I'm a bit confused, because that is in the same format as your desired YouTube example url of `/v=xxx`. If you want letters in the id, you could always convert the id to hex or a custom alpha-numeric system and start the id count at a value that looks good to you. – Briguy37 Aug 16 '12 at 17:01

4 Answers4

1

Generating non colliding short hash will indeed be a headache. So, instead the slug format of Stackoverflow is very promising and is guaranteed to produce non duplicate url.

For example, this very same question has

https://stackoverflow.com/questions/11991785/unique-url-from-primary-key

Here, it has unique primary key and also a title to make it more SE friendly.


However as commented, they are few previously asked question, that might clear out, why? what you are trying is better left out.

  1. How to generate a unique hash for a URL?
  2. Create Tinyurl style hash

Creating short hashes increases the chances a collision a lot, so better user base64 or sha512 functions to create a secured hash.

Community
  • 1
  • 1
Starx
  • 77,474
  • 47
  • 185
  • 261
  • You are right, but that is not what he asked for. He could also just use the primary key (number), but he needs a short url with chars. – Green Black Aug 16 '12 at 17:00
0

You can simply make a hash of the time, and afterwards check that hash (or part of that hash in your DB. If you set an index on that field in your DB (and make sure the hash is long enough to not make a lot of collisions), it won't be an issue at all time wise.

<?php

$hashChecked = false;

while( $hashChecked === false ){
  $hash = substr( sha1(time().mt_rand(9999,99999999)), 0, 8);  //varchar 8 (make sure that is enough with a very big margin)
  $q = mysql_query("SELECT `hash` FROM `tableName` WHERE `hash` = '".$hash."'");
  $hashChecked = mysql_num_rows() > 0 ? false : true;
}

mysql_query("INSERT INTO `tableName` SET `hash` = '".$hash."'");
Green Black
  • 5,037
  • 1
  • 17
  • 29
  • Your while loop will never break out it you get a collision, and doing a select for every insert is unneeded overhead. – sberry Aug 16 '12 at 16:53
  • You are right, I changed the code. Only way to avoid the 'overhead' is to use an auto increment field. – Green Black Aug 16 '12 at 16:55
0

This is fairly straightforward if you're willing to use a random number to generate your short URL. For example, you can do this:

 SELECT BASE64_ENCODE(CAST(RAND()*1000000 AS UNSIGNED INTEGER)) AS tag

This is capable of giving you one million different tags. To get more possible tags, increase the value by which the RAND() number is multiplied. These tag values will be hard to predict.

To make sure you don't get duplicates you need to dedupe the tag values. That's easy enough to do but will require logic in your program. Insert the tag values into a table which uses them as a primary key. If your insert fails, try again, reinvoking RAND().

If you get close to your maximum number of tags you'll start having lots of insert failures (tag collisions).

BASE64_ENCODE comes from a stored function you need to install. You can find it here:

http://wi-fizzle.com/downloads/base64.sql

If you're using MySQL 5.6 or higher you can use the built-in TO_BASE64 function.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

I wanted to do something similar (but with articles, not uploaded documents), and came up with something a bit different:

  • take a prime number [y] (much) larger than the max number [n] of documents there will ever be (e.g. 25000 will be large enough for the total number of documents, and 1000099 is a much larger prime number than 25001)
  • for the current document id [x]: (x*y) modulus (n+1)
  • this will generate a number between 1 and n that is never duplicated

although the url may look like a traditional primary key, it does have the slight advantage that each subsequent document will have a id which is totally unrelated to the previous one; some people also argue that not including the primary key also has a very slight security advantage...

ChrisW
  • 4,970
  • 7
  • 55
  • 92