Why is it common practice to use a bijective function and an incrementing numerical sequence for URL shortening?

Question

I've read the question and answer How to code a URL shortener? and all the math makes perfect sense. My question is, since you have to go back to the database/datastore anyway for the lookup, why not just generate a random short string in your alphabet and store it with the full URL in your datastore, rather than converting it back to a numerical ID?

It seems to me that this saves doing any math on the server, reduces complexity, and eliminates the 'walkability' of the short URL space (for my use-case, this is critical; URLs must not be guessed). If using a NoSQL store designed for key->value lookup, it doesn't seem that there is any potential performance issue of looking up the full URL value from a string as opposed to a numerical ID.

I'd like to know if I'm missing something.

"why not just generate a random short string" - the shorter the string, the more likely there will be a collision. — Matt, Jun 26 '14 at 21:33

Rich O'Kelly · Answer 1 · 2014-06-26T21:40:47.383

4

The random short string approach violates the bijectivity of the shortening function.

Given two URLs a and b and your shortening function f, it should be guaranteed that: if a = b then f(a) = f(b), however, since f generates a random value, the bijectivity is violated.

If however, you are just looking to shorten any particular URL and do not mind that subsequent shortenings of the same URL will generate different values, then the approach you outline above would be more efficient.

edited Jun 26 '14 at 21:40

answered Jun 26 '14 at 21:34

Rich O'Kelly

41,274
9
83
114

1

You're not describing bijectivity (or, more precisely, surjectivity), but a property inherent to *functions*. `f` is surjective (or left-unique) iff for all `a,b`: `f(a) = f(b)` implies `a = b`. `f` is functional (or right-unique) iff for all `a,b`: `a = b` implies `f(a) = f(b)`. – Jun 26 '14 at 22:28
3

@Rhymoid I agree with your essential point, but I think you intend "injective" where you say "surjective". – Matt Jun 26 '14 at 22:42
@Matt Well... that's embarrassing XD – Jun 26 '14 at 23:13
I don't think that is how the bijectivity works in the case of url shorteners. It is just from going shortlink<->id, not the original URL. I think to accomplish what you are saying, there needs to be a reverse lookup by the full URL. – Peter Jun 26 '14 at 23:50
1

@Pete, precisely, but that lookup will likely be less performant than the approach outlined in the How to code a URL shortener question. – Rich O'Kelly Jun 27 '14 at 00:11
@rich.okelly how so? For reverse lookup (seeing if the full URL has been shortened before), both need to look up by the full URL itself, since the shortened version is just based on a sequence (or in my case random)- in both cases not a hash of the full URL itself. – Peter Jun 27 '14 at 19:19
@Pete In the example where the sequence effectively forms a (bijective) hash, it is unnecessary to lookup to see whether the URL has been seen before, the behaviour whether it had or had not is the same – Rich O'Kelly Jun 29 '14 at 04:27
I don't see how this answers the question. You're stating that using a random function violates bijectivity. The question is why is bijectivity needed. I was curious about this same question while writing a shortener and it seems like you can just use an injective function if you don't care about the same long URL mapping to two different short URLs if shortened twice. – stuckj Aug 12 '16 at 16:04

Why is it common practice to use a bijective function and an incrementing numerical sequence for URL shortening?

1 Answers1