2

I read this post : How to code a URL shortener? that perfectly explains how to create a classical URL shortener (please read it to understand my question below).

But I'd like to add 3 more requirements that match my specific needs, and I'm not sure how to deal with that:

1 (the most important) -> I'd like that no one can guess an URL, just by watching other URLs.

Indeed, curently (in the post I quoted): generated URLs are just following themselves, changing the last character. So, if F(2789) = "ajuka", then it's easy to know that F(2790) = "ajukb"

Of course, the function still MUST be bijective, and reversible (find the id from the url, and find the url from the id => hash functions does not work).

2 -> I'd like to remove some characters that may be confusing (0/O/l/I) from the a-z/A-Z/0-9 list. This not seem really hard.

3 -> I'd like to have a minimum amount of characters in the URL (let's say 5), which means my first url will be "aaaaa" (not "a").

Thanks.

Community
  • 1
  • 1
David Dahan
  • 10,576
  • 11
  • 64
  • 137
  • 1
    Hash functions do work, you simply need to store the hash / id mapping. The function does not need to be bijective and reversible. – Simeon Visser Dec 05 '13 at 23:20
  • 1
    The function needs to be injective (bijective is not needed) to ensure that there are no url-collisions. – MrSmith42 Dec 05 '13 at 23:24
  • If it's a known hash function, a person could test for the existence of a given url in the shortener. I'm not sure whether this is undesirable or not. – recursive Dec 05 '13 at 23:24

2 Answers2

3
  1. When someone creates a shortened url, generate a random id in your id space. Repeat until you generate an unused one. You will need to make sure your id space never fills up too much otherwise this step could take a long time. You can write your id generator so that it excludes characters like 0 and l to avoid ambiguity per your requirements.
  2. Store the id and url in a database.
  3. When a request comes in for a given id, look it up in your database, and return the url associated with the given id.
recursive
  • 83,943
  • 34
  • 151
  • 241
  • 1
    4. Make sure the ID space is big enough that it can't be exhaustively searched. Something like 60 bits (= 10 base64 characters) would be a reasonable minimum. – Ilmari Karonen Dec 06 '13 at 00:16
  • Oh, and 5. Use a [secure random number generator](http://www.php.net/manual/en/function.openssl-random-pseudo-bytes.php) to generate your random IDs, so that they can't be predicted. – Ilmari Karonen Dec 06 '13 at 00:21
  • I think your point 5 is covered in my point 1. ("random id in your id space") – recursive Dec 06 '13 at 02:02
  • The point (put not intended) of my point 5 is that just using rand() or mt_rand() to generate the IDs is _not_ a good idea (at least unless you're running on something like Suhosin, which modifies those functions to improve their security a bit). There's random and then there's random, and this is one of the cases where you need the better kind of randomness. – Ilmari Karonen Dec 06 '13 at 02:21
  • Point 4 is relevant and rise a new problem : how to keep a user-friendly URLs and avoid bruteforce attacks in the same time? `http://myservice.com/5g4f` -> user-friendly but searchable `http://myservice.com/5g4f896ghUupo` -> unsearchable but can repulse people to use it. – David Dahan Dec 06 '13 at 10:05
  • @recursive : The solution to create a new id until it is not used is efficient but not really "pretty". In addition in my case it's not easy to know if the id space will be filled or not. Nevertheless, the main advantage I see is that you don't need any hash/crypt/obfuscation. Indeed, even if the mapping [URL to ID] or [id to URL] is easy to do for anyone, hacker does not now which URL/ID to search. Thanks. – David Dahan Dec 06 '13 at 15:12
1
  1. You don't need to share your algorithm with anyone, so I think you can get away with a bit of obfuscation instead of full cryptographic security. I'd use a combination of the following:

    (a) Randomize the order of your base-62 symbols (62!=3.14 E+85 permutations)

    (b) Randomize the order of the digits in your base-62 number

    (c) Use one or more linear congruential generators with a modulus of 62^6 to modify each ID value. As long as you choose your parameters carefully, an LCG will produce a repeating sequence of maximal length, which means as long as you pass each ID through the LCG the same number of times, you will never have two IDs land on the same value.

  2. All right then, base-58. It doesn't make much difference :-)

  3. Just pad the base-58 numbers with leading zeros (or whatever symbol you're using to represent "zero").

r3mainer
  • 23,981
  • 3
  • 51
  • 88