118

How do services like TinyURL or Metamark work?
Do they simply associate the tiny URL key with a [virtual?] web page which merely provide an "HTTP redirect" to the original URL? or is there more "magic" to it ?

[original wording] I often use URL shortening services like TinyURL, Metamark, and others, but every time I do, I wonder how these services work. Do they create a new file that will redirect to another page or do they use subdomains?

Abel
  • 56,041
  • 24
  • 146
  • 247
Nathan Campos
  • 28,769
  • 59
  • 194
  • 300
  • 1
    To rephrase the [non-]question: "How do services like TinyURL work? Do they simply associate the tiny URL key with a [virtual?] web page which merely provide an "HTTP redirect" to the orginal URL? Is this what you are asking ? – mjv Oct 13 '09 at 19:27
  • 1
    Do the shortened urls ever get expired? (i.e. the database entries for those URLs are removed from the servers) – thd Jan 26 '11 at 18:55
  • 2
    @thd: yes, but it can depend on (daily) hits, and the policy of short url services provider. They may also allow for never-expiry, some ask a membership for that. – Abel Mar 10 '11 at 22:40
  • Possible duplicate of [How does a URL Shortener work?](https://stackoverflow.com/questions/4572734/how-does-a-url-shortener-work) – roottraveller Sep 11 '17 at 07:36

4 Answers4

239

No, they don't use files. When you click on a link like that, an HTTP request is send to their server with the full URL, like http://bit.ly/duSk8wK (links to this question). They read the path part (here duSk8wK), which maps to their database. In the database, they find a description (sometimes), your name (sometimes) and the real URL. Then they issue a redirect, which is a HTTP 302 response and the target URL in the header.

This direct redirect is important. If you were to use files or first load HTML and then redirect, the browser would add TinyUrl to the history, which is not what you want. Also, the site that is redirected to will see the referrer (the site that you originally come from) as being the site the TinyUrl link is on (i.e., twitter.com, your own site, wherever the link is). This is just as important, so that site owners can see where people are coming from. This too, would not work if a page gets loaded that redirects.

PS: there are more types of redirect. HTTP 301 means: redirect permanent. If that would happen, the browser will not request the bit.ly or TinyUrl site anymore and those sites want to count the hits. That's why HTTP 302 is used, which is a temporary redirect. The browser will ask TinyUrl.com or bit.ly each time again, which makes it possible to count the hits for you (some tiny url services offer this).

Jawwad
  • 1,326
  • 2
  • 9
  • 18
Abel
  • 56,041
  • 24
  • 146
  • 247
  • Considering it's just a map, a little light on the lifetime of each shortened url? – KG - Feb 23 '10 at 12:03
  • 3
    Actually I think, Bit.ly uses HTTP 301 instead of 302 (the last I heard) – Kenny Cason Aug 10 '10 at 15:58
  • 1
    Since bit.ly won't let you change where one of their URLs points to, 301 makes sense. No need to remember the bit.ly version and recheck it. – Joost Schuur Aug 22 '10 at 19:10
  • 11
    @KennyCason / @Joost Schuur: it is indeed HTTP 301 that is used, however, with a timestamp. This turns it into a `Moved` not `Moved Permanently`. This is a subtle difference. By adding the timestamp, the browser considers it should check whether the resource is changed or not when this timeout it reached. Others, like is.gd, use a normal `301 Moved Permanently` and the browser doesn't need to re-check (but often will). Finally, services like url4.eu do not redirect at all, but show you an advertisement first. With the 301 the services can still count *unique visitors*, but not all hits. – Abel Aug 22 '10 at 21:49
  • @abel do these services check for duplicates and assign the id to a url that has already been added? so if two people add google.com should it give back the id of abc? or is that just a smart feature? – Steve Jun 21 '11 at 15:27
  • @Steve: I've seen services where you can have an account to see what urls you matched how many hits you have etc. There you can also see that you can have multiple ids / short urls to the same link. However, that is not to say that other services can make a smarter algorithm and reuse the same id for the same url. But that would remove the possibility of monitoring hits per user, though. – Abel Jun 25 '11 at 10:11
  • 6
    The example bitly URL is now a real one and actually redirects back to this question ;-) See http://bitly.com/duSk8wK+ for the info page. – Ronald Nov 22 '11 at 21:06
  • I clicked the link from three different browsers and it consistently showed the same number of clicks. How does bit.ly know that these shouldn't be new 'click' counts? – Costa Michailidis Aug 28 '15 at 04:26
  • 1
    @Costa: there are many ways of counting clicks, it is possible that it keeps track of your network card ID, which is a way of tracking whether a request comes from the same computer. It is possible to fake or change that ID, it is not foolproof. Also, it is possible that it tracks or uses third-party cookies that have been set earlier and to the same user on different browsers, which you can check by clearing all session data and using an anonymous browser. – Abel Aug 28 '15 at 08:43
  • If bitly uses 301 which is permanent redirection then how would it keep track of number of hits? – Madhusudan Joshi Jul 07 '18 at 10:30
  • @mad, see my comment above, it explains exactly that and also that 301 is not always permanent forever – Abel Jul 09 '18 at 01:10
114

Others have answered how the redirects work but you should also know how they generate their tiny urls. You'll mistakenly hear that they create a hash of the URL in order to generate that unique code for the shortened URL. This is incorrect in most cases, they aren't using a hashing algorithm (where you could potentially have collisions).

Most of the popular URL shortening services simply take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).

A simplified example of a TinyURL Database Table:

ID       URL                           VisitCount
 1       www.google.com                        26
 2       www.stackoverflow.com               2048
 3       www.reddit.com                        64
...
 20103   www.digg.com                         201
 20104   www.4chan.com                         20

Web Frameworks that allow flexible routing make handling the incoming URL's really easy (Ruby, ASP.NET MVC, etc).

So, on your webserver you might have a route action that looks like (pseudo code):

Route: www.mytinyurl.com/{UrlID}
Route Action: RouteURL(UrlID);

Which routes any incoming request to your server that has any text after your domain www.mytinyurl.com to your associated method, RouteURL. It supplies the text that is passed in after the forward slash in your URL to that method.

So, lets say you requested: www.mytinyurl.com/fif

"fif" would then be passed to your method, RouteURL(String UrlID). RouteURL would then convert "fif" to its base10 equivalent, 20103, and a database request will be made to redirect to whatever URL is stored under the ID 20103 (in this case, www.digg.com). You would also increase the visit count for Digg by one before redirecting to the correct URL.

This is a really simplified example but you should be able to get the general idea.

A Salcedo
  • 6,378
  • 8
  • 31
  • 42
  • 12
    Thanks for the nice explanation. So what happens when someone tries to create a short URL for an already existing long URL? Do they perform a full text search on the database? I do not think so as it will be too much time consuming. Hash or message digest based approach looks more practical. – Piyush Kansal Oct 02 '13 at 07:13
  • @PiyushKansal you could use the hash internally to do a `O(1)` lookup to find duplicates; and then route the existing tiny URL for that, or could choose to generate a new one. As far as I can tell, `goo.gl` reuses the tiny urls for the same URL; try this on your end for this page: Do you get this >> `goo.gl/8gVb8X` ? – Kingz Jun 04 '18 at 16:29
  • How do they handle url parameters? For example www.digg.com?filter=123 – Ronen Aug 12 '18 at 07:44
7

As an extension to @A Salcedo answer:

Some url shortening services (Tinyarro.ws) go to extreme by using Unicode (UTF-8) to encode characters in shortened url - which allows higher amount of websites before having to add additional symbol. Since most of UTF-8 is accepted for use ((IRI) RFC 3987 handled by most browsers) that bumps from 62 sites per symbol to ~1,112,064.

To put in perspective one can encode 1.2366863e+12 sites with 2 symbols (1,112,064*1,112,064) - in November 2009, shortened links on bit.ly were accessed 2.1 billion times (Around that time, bit.ly and TinyURL were the most widely used URL-shortening services.) which is ~600 times less than you can fit in just 2 symbols, so for full duration of existence of all url shortening services it should last another 20 years minimum till adding third symbol.

Community
  • 1
  • 1
Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
7

In simple words, URL shortener maps an arbitrary long sequence of character ( original, long crappy url ) into a short and slick sequence of characters. This is nothing but Hashing, which is most commonly used to create lookup tables, HashMap, md5 Hash for cryptographic purposes etc.

To understand the URL-Shortening process I have created a demo project on GitHub and also a blog post. Do refer to this and let me know if it was helpful.

Blog Post : URL Shortening

Anand Joshi
  • 91
  • 1
  • 3