6

Just as a fun project I wanted to try and make a simple URL shortener for my own personal use but I wanted to try and incorporate things that I liked from other shorteners like bit.ly and such. So I've come to a snag when it comes to assigning short URL IDs.

Right now I just manually assign the code but I would like to automate it. I could do it the easy way by just assigning incrementing IDs (I thought this could be done using an assigned auto increment value on the MySQL database and just use the PHP dechex() function for the URL) but it seems that other shorteners are random.

I know that I won't get an absurd number of URLs in the database but I still want to keep the process efficient which makes creating random unique IDs rather taxing with many URLs in the database. I don't really have any idea about how to go about making a system to make the IDs that doesn't make duplicates and doesn't run slowly.

Laurel
  • 5,965
  • 14
  • 31
  • 57
Paul Young
  • 63
  • 1
  • 3
  • If you actually want to use the IDs from your database but make them *look* random (and shorter), use [`(new Id())->encode($id)`](https://github.com/delight-im/PHP-IDs). If you really want randomness, use something like [`Random::hexLowercaseString($length)`](https://github.com/delight-im/PHP-Random). – caw Nov 20 '19 at 16:37

3 Answers3

2

See: PHP short hash like URL-shortening websites and the answer you might want: http://blog.kevburnsjr.com/php-unique-hash

The second link might be particularly useful, just short-hash the current ID.

Community
  • 1
  • 1
Ry-
  • 218,210
  • 55
  • 464
  • 476
  • I'm liking this approach now that you and @a3_nm have convinced me of it. Sure it may not be random and may not be 8 characters but I think I've come to term with some of the limitations of what the system will have and the fact that it is really only a project for fun. Thanks guys for your help, I will be using a Base36 or Base64 system to generate my codes. – Paul Young Apr 28 '11 at 04:13
2

Use one of the common hash functions, like MD5 or SHA-1 to take the hash of your URL, print it as hexidecimal format, and take the last 8 characters (or the first 8 characters). This has the advantage that you can always determine whether the URL has already been submitted.

Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
  • That was one of the first solutions I had come across but the issue is that there isn't any way to be sure that those 8 characters I chose would be unique, only the full hash would be unique. – Paul Young Apr 28 '11 at 03:20
  • You can't even guarantee that the full hash will be unique, but for all practical purposes it's close enough. If the first 8 characters isn't close enough to unique for practical purposes (I think it is -- lots of other software uses it as a convenient unique abbreviation for the full hash), then you want to come up with a deterministic way to modify the hash as a fallback (maybe taking a different window of 8 characters from the full hash). – Ken Bloom Apr 28 '11 at 03:28
  • Yeah I guess you're right. In the end I think I want to be safe and make sure there are no collisions at all. Thanks for the insight though. – Paul Young Apr 28 '11 at 04:05
  • If you're going that way, you might as well just use crc32 - it has 8 characters, naturally. Well, at least if you use something like http://www.php.net/manual/en/function.crc32.php#93797 . – Xiong Chiamiov Jun 27 '11 at 21:17
0

You can always generate random IDs, check if they have already been assigned, and draw a new one in the unlikely event where you hit one that had already been used. The lookup to see if they were already assigned should not be really slow since you're going to do it each time someone queries one of your URLs anyway.

If you want random hex strings, a quick and dirty way is to generate a random large number, hash it using sha1 or any other hash function, and take the first 8 chars. I don't really see why you would want hex rather than random base64 though, as base64 would allow you to pack more URLs into less characters. [Actually, you might want to generate IDs by hashing the URLs--should be as good as hashing random values if you use a secure crypto hash, and it would ensure that the same URL always gets the same key, preventing duplicates.]

Don't forget to start generating longer IDs once you hit a predefined number (or collide too often), as you wouldn't like to have the thing go slow as you run out of IDs and get a lot of collisions.

If you want nice theoretical guarantees about collision probabilities and all this sort of stuff, there are a lot of it, depending on the hashing scheme that you use.

Oh, and just on a side note, there are indeed some URL shorteners using sequential IDs, like http://lilurl.sourceforge.net/. I think the main reason that it's usually avoided is to prevent people with a good sense of timing to associate offensive IDs to URLs of their choosing...

a3nm
  • 8,717
  • 6
  • 31
  • 39
  • Yeah I thought that doing it the redraw way would be okay until someone pointed out the issue with the larger number of entries to compare against. – Paul Young Apr 28 '11 at 03:25