0

I have a big url I want to shrink it into a smaller url. I have implemented the routing part in my rails application. Now the tricky part is to shorten the actual url. Is there a recommended algorithm that shortens a string into a set number of elements(could be a mix of number and strings)


Sorry for not proving an example. Say for instance I have "localhost:3000/orders/1". I need something like "localhost:3000/:somesmallstring".


example.com/orders/1/show_video == exmpl.com/shortened_url should come to the same page. I own both the domains.

TL; Also consider this example. Say I have the domain name example.com for my site. Can I use the shorten urls for say exmpl.com/shortened_url too for my site? I have purchased both the domains. What changes should I make in the routes file for this so that it loads the shortening module to find the real id when coming from that different domain name alone? Is there a way for this.

New Alexandria
  • 6,951
  • 4
  • 57
  • 77
Aravind
  • 1,391
  • 1
  • 16
  • 41
  • 1
    A string of an arbitrary length cannot be shortened to five digits. A hexatridecimal string of length five can only distinguish 36 ^ 5 = 60466176 different strings. – sawa Oct 08 '13 at 10:01
  • @sawa: But a set of one million such strings could be referenced that way. We need to see OP's starting strings before advising – Neil Slater Oct 08 '13 at 10:02
  • @NeilSlater Right. That is my point. The OP did not put any restriction on the source string. That is what I am claiming. – sawa Oct 08 '13 at 10:05
  • Given your new examples, `/orders/1` is already a pretty short path. It would be trivial to provide an alias for it like `/o/1` as well (see Adeptus answer). You start the question with "I have a big URL that I want to shrink". In what way do you consider `example.com/orders/1/show_video` big and would `exmpl.com/o/1/sv` be acceptably smaller? – Neil Slater Oct 08 '13 at 11:29
  • I just provided a smaller example. I have many urls like this that have a larger exapansion. The question is how will I handle `exmpl.com/some_url` as the same thing as `example.com/url?` – Aravind Oct 08 '13 at 11:39
  • 1
    can you use a db? store address in db and use `exmpl.com/ID`. – mrd abd Oct 08 '13 at 13:56
  • possible duplicate of [Best way to create unique token in Rails?](http://stackoverflow.com/questions/6021372/best-way-to-create-unique-token-in-rails) – New Alexandria Oct 08 '13 at 14:10

3 Answers3

0

You can try resolve in this way:

routes.rb

  resources :authors, :path => "aut" do
    resources :articles, :path => "art"
  end

Running rake routes on the command line it produced this:

author_articles     GET    /aut/:author_id/art(.:format)          articles#index
                    POST   /aut/:author_id/art(.:format)          articles#create
new_author_article  GET    /aut/:author_id/art/new(.:format)      articles#new
edit_author_article GET    /aut/:author_id/art/:id/edit(.:format) articles#edit
author_article      GET    /aut/:author_id/art/:id(.:format)      articles#show
                    PUT    /aut/:author_id/art/:id(.:format)      articles#update
                    DELETE /aut/:author_id/art/:id(.:format)      articles#destroy
authors             GET    /aut(.:format)                         authors#index
                    POST   /aut(.:format)                         authors#create
new_author          GET    /aut/new(.:format)                     authors#new
edit_author         GET    /aut/:id/edit(.:format)                authors#edit
author              GET    /aut/:id(.:format)                     authors#show
                    PUT    /aut/:id(.:format)                     authors#update
                    DELETE /aut/:id(.:format)                     authors#destroy
Adeptus
  • 673
  • 3
  • 6
  • This definitely provides shorter aliases for a set of URL paths, but without any examples from the OP, it is a bit of a guess. – Neil Slater Oct 08 '13 at 10:01
0

you can use Hash functions. for example the MD5 of each string with any length will be a 32 char string.

but it doesn't guaranty that your urls mapping will be unique. Hash functions are one-way functions and you cannot reverse the process.

mrd abd
  • 828
  • 10
  • 19
  • The hashes would need to be stored in db in order to look them up and find the associated command. In which case the OP may as well use e.g. a url-safe Base64 encoded number, and just give a sequence number to each allowed request (e.g. like bt.ly or tinyurl.com) – Neil Slater Oct 08 '13 at 13:43
0

Is there a recommended algorithm that shortens a string into a set number of elements(could be a mix of number and strings)

As per sawa's comment, there is no such algorithm.

If you have a finite set of allowed strings, you could however enumerate them and express that number in a suitable base. There is a well-known and well-supported "url safe" version of Base 64, ideal for representing arbitrary compressed data inside URL paths.

For instance, just taking your integer order id, it is already enumerable. If you can safely assume that the maximum allowed value was a 32-bit integer, we can encode that as follows:

require 'base64'
number_to_encode = 1_234_567_890
compact_string = [number_to_encode].pack('N*') # Network byte order
encoded = Base64.urlsafe_encode64( compact_string )
# => "SZYC0g=="

That's taken an id with up to 10 digits, and created a url string with 8 characters from it. To decode it back to the number you need:

require 'base64'
string_to_decode = "SZYC0g==" # e.g. params[:order_id] from /o/:order_id
packed_string = Base64.urlsafe_decode64( string_to_decode )
number = packed_string.unpack('N*').first
# => 1234567890

In principle, you can express any kind of data via this approach, provided you can unpack and disambiguate it in the relevant controller. However, there are limits on compression. You cannot take a parameter that is an arbitrary 32-bit integer, and fit it into 5 base64 characters (because each base64 character is 6 bits of your data at best).

If you need short URLs like you may have seen at bit.ly or tinyurl.com, then this is done by creating a large lookup table of possible URLs, and encoding the id for each row from that table in a similar way to above. Or alternatively, you could store this data as a unique index on each model, and either put sequence numbers into that column, or perhaps generate random strings that you test for uniqueness. All these approaches essentially boil down to having a limited set of references to resolve, turning that into a number (either actual count for the item, or something unique chosen to be under a theoretical maximum), and using an encoding scheme like Base64 to represent it as a smaller number of characters than if you were using base 10.

Neil Slater
  • 26,512
  • 6
  • 76
  • 94