We are building a URL shortening functionality for our site.
What we've come up at the moment:
- We take a URL (http://www.google.com) and sha1 it and end up with a 40 character hash (738ddf35b3a85a7a6ba7b232bd3d5f1e4d284ad1).
- We take the sha1 hash and encode it into base62 (basically A-Z, a-z, 0-9) and end up with a 28 character hash (jNMYchEoche67ro1k5gsCcHfDzmR) that we can decode back to the original sha1.
The reason we use sha1 is to make sure users cannot guess the next URL from the current / past URL(s).
The reason we use base62 is to make the URL valid and readable to the user.
Now a 28 character "short-url" that will be appended to our domain (http://www.google.com/r/jNMYchEoche67ro1k5gsCcHfDzmRis) is a bit too long, especially when you consider Twitters character limit.
What we're currently considering is cutting the sha1 down by around 20 characters, which would produce a 14 character short-url, but going down any further and we're worried we will run into collisions too quickly.
We also thought about Compressing big number (or string) to small value but that would require us to split the 28 or 14 character hash into pieces of 2 and sort the pieces and we have no idea how to return to the original hash from there.
Does anyone have any idea on what we could do? We would prefer a solution where we do not depend on the database to construct the URL, but if a DB is required please keep in mind that we're limited to Redis / MongoDB (which means no auto-increment integer field).