80

I'm currently using the following function to 'convert' a relative URL to an absolute one:

function qualifyURL(url) {
    var a = document.createElement('a');
    a.href = url;
    return a.href;
}

This works quite well in most browsers but IE6 insists on returning the relative URL still! It does the same if I use getAttribute('href').

The only way I've been able to get a qualified URL out of IE6 is to create an img element and query it's 'src' attribute - the problem with this is that it generates a server request; something I want to avoid.

So my question is: Is there any way to get a fully qualified URL in IE6 from a relative one (without a server request)?


Before you recommend a quick regex/string fix I assure you it's not that simple. Base elements + double period relative urls + a tonne of other potential variables really make it hell!

There must be a way to do it without having to create a mammoth of a regex'y solution??

James
  • 109,676
  • 31
  • 162
  • 175

11 Answers11

46

How strange! IE does, however, understand it when you use innerHTML instead of DOM methods.

function escapeHTML(s) {
    return s.split('&').join('&amp;').split('<').join('&lt;').split('"').join('&quot;');
}
function qualifyURL(url) {
    var el= document.createElement('div');
    el.innerHTML= '<a href="'+escapeHTML(url)+'">x</a>';
    return el.firstChild.href;
}

A bit ugly, but more concise than Doing It Yourself.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • I found this similar solution on a blog which doesn't need the code to escape: http://stackoverflow.com/a/22918332/82609 – Sebastien Lorber Apr 07 '14 at 16:46
  • This approach replaces null (U+0000) with � (U+FFFD), according to [HTML spec](https://www.w3.org/TR/html5/syntax.html#attribute-value-(double-quoted)-state). – Oriol Apr 09 '16 at 14:32
26

As long as the browser implements the <base> tag correctly, which browsers tend to:

function resolve(url, base_url) {
  var doc      = document
    , old_base = doc.getElementsByTagName('base')[0]
    , old_href = old_base && old_base.href
    , doc_head = doc.head || doc.getElementsByTagName('head')[0]
    , our_base = old_base || doc_head.appendChild(doc.createElement('base'))
    , resolver = doc.createElement('a')
    , resolved_url
    ;
  our_base.href = base_url || '';
  resolver.href = url;
  resolved_url  = resolver.href; // browser magic at work here

  if (old_base) old_base.href = old_href;
  else doc_head.removeChild(our_base);
  return resolved_url;
}

Here's a jsfiddle where you can experiment with it: http://jsfiddle.net/ecmanaut/RHdnZ/

ecmanaut
  • 5,030
  • 2
  • 44
  • 66
  • It's three years late to the party, so it will take awhile to rise to the top without either marketing or lots of people having the issue and wanting a code-conservative and accurate solution. – ecmanaut Jul 03 '13 at 17:11
  • Better late than ever. I think your solution is clever and more robust than the other, to rely on the browser seems the way to do. Even if it might be perceived king of hacky to inject base tag. – Hartator Jul 05 '13 at 10:41
  • 2
    Other than supporting arbitrary base URLs, how exactly is this different than the solution presented in the question? Does it work on IE 6? – John Jul 21 '13 at 01:27
  • That is the only difference. If you have an IE6 around, follow the link above and try; if the last form element says https://google.com/search?foo=bar it does work in IE6. – ecmanaut Jul 21 '13 at 07:59
  • Shouldn't the commas in your `var doc = , ... `be before the linebreaks? Otherwise JS will just execute the above as separate statements, with a null statement at the end. Also, won't all but the doc variable then be globals? – Chris Middleton Aug 21 '14 at 20:16
  • 1
    @AmadeusDrZaius Not should, but they can be if you like. Javascript only adds auto semicolon at the end of a line when doing it won't make the upcoming line an invalid statement. ", foo = 1" is a syntax error, and thus the whole var statement is evaluated in bulk, sans semicolon insertion. – ecmanaut Aug 22 '14 at 01:27
  • Can there be a jQuery version of this? I tried to port it line by line but the code still breaks. – Akshay Raje Feb 18 '15 at 09:45
  • not bad, but at least FF 31.0 with **`resolve('#x')` returns `http://.../mypage/undefined#x` :-(**, but `resolve('.#x')` is ok: `http://.../mypage/#x` – Andreas Covidiot Mar 10 '15 at 22:09
  • **IE 11.09 + Chrome 40.0 same problem with `resolve('#x')` :-(** (at least some consistency ;) ) – Andreas Covidiot Mar 10 '15 at 22:15
  • This answer relies on some "browser magic", but the question was precisely about making this "browser magic" work in IE6. So this code can be much useful, but I don't see how it answers the question. – Oriol Mar 03 '16 at 17:37
  • 2
    @AndreasDietrich That's because you don't pass any argument to the `base_url` parameter, so it becomes `undefined` and is stringified to `"undefined"`. You should pass the empty string instead. Or, if you want to make the 2nd parameter optional, use `our_base.href = base_url || ""` instead of `our_base.href = base_url`.. – Oriol Mar 03 '16 at 17:42
  • 1
    Good idea, @Oriol – no reason not to have a friendlier default behaviour for people not passing both parameters. Integrated. – ecmanaut Mar 03 '16 at 21:33
16

You can make it work on IE6 just cloning the element:

function qualifyURL(url) {
    var a = document.createElement('a');
    a.href = url;
    return a.cloneNode(false).href;
}

(Tested using IETester on IE6 and IE5.5 modes)

Oriol
  • 274,082
  • 63
  • 437
  • 513
10

I found on this blog another method that really looks like @bobince solution.

function canonicalize(url) {
    var div = document.createElement('div');
    div.innerHTML = "<a></a>";
    div.firstChild.href = url; // Ensures that the href is properly escaped
    div.innerHTML = div.innerHTML; // Run the current innerHTML back through the parser
    return div.firstChild.href;
}

I found it a little more elegant, not a big deal.

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
7

URI.js seems to solve the issue:

URI("../foobar.html").absoluteTo("http://example.org/hello/world.html").toString()

See also http://medialize.github.io/URI.js/docs.html#absoluteto

Not testeed with IE6, but maybe helpful for others searching to the general issue.

koppor
  • 19,079
  • 15
  • 119
  • 161
  • 1
    On the node side of things (for crawling, etc), the correct library here is available via `npm install URIjs`, not the other library by similar name – Josh Hibschman Aug 20 '15 at 15:19
  • the npm package named has changed to `urijs` https://github.com/medialize/URI.js#using-urijs – Daniel Lizik Sep 19 '16 at 17:56
7

I actually wanted an approach to this that didn't require modifying the original document (not even temporarily) but still used the browser's builtin url parsing and such. Also, I wanted to be able to provide my own base (like ecmanaught's answer). It's rather straightforward, but uses createHTMLDocument (could be replaced with createDocument to be a bit more compatible possibly):

function absolutize(base, url) {
    d = document.implementation.createHTMLDocument();
    b = d.createElement('base');
    d.head.appendChild(b);
    a = d.createElement('a');
    d.body.appendChild(a);
    b.href = base;
    a.href = url;
    return a.href;
}

http://jsfiddle.net/5u6j403k/

Chris Hopman
  • 2,082
  • 12
  • 11
  • 1
    Not sure if I am missing something, but IE6 (nor 7, 8) does not support `document.implementation.createHTMLDocument` – Oriol Mar 03 '16 at 13:19
  • I used this when I was using a web app to load and scrape other pages. In the callback from jQuery.load, `$("#loadedHere").createElement("a").url="foo"` resulted in an empty url so I had to resort to creating a separate doc. – ericP May 05 '17 at 09:28
5

This solution works in all browsers.

/**
 * Given a filename for a static resource, returns the resource's absolute
 * URL. Supports file paths with or without origin/protocol.
 */
function toAbsoluteURL (url) {
  // Handle absolute URLs (with protocol-relative prefix)
  // Example: //domain.com/file.png
  if (url.search(/^\/\//) != -1) {
    return window.location.protocol + url
  }

  // Handle absolute URLs (with explicit origin)
  // Example: http://domain.com/file.png
  if (url.search(/:\/\//) != -1) {
    return url
  }

  // Handle absolute URLs (without explicit origin)
  // Example: /file.png
  if (url.search(/^\//) != -1) {
    return window.location.origin + url
  }

  // Handle relative URLs
  // Example: file.png
  var base = window.location.href.match(/(.*\/)/)[0]
  return base + url

However, it doesn't support relative URLs with ".." in them, like "../file.png".

Feross
  • 1,541
  • 14
  • 17
  • This has some problems. For example, you are assuming base is the same as windows and I don't think this works if I have a url param in url. Say `/img/profile.php?url=https://google.com/logo.svg`. – Ted Sep 17 '17 at 07:41
3

This is the function I use to resolve basic relative URLs:

function resolveRelative(path, base) {
    // Absolute URL
    if (path.match(/^[a-z]*:\/\//)) {
      return path;
    }
    // Protocol relative URL
    if (path.indexOf("//") === 0) {
      return base.replace(/\/\/.*/, path)
    }
    // Upper directory
    if (path.indexOf("../") === 0) {
        return resolveRelative(path.slice(3), base.replace(/\/[^\/]*$/, ''));
    }
    // Relative to the root
    if (path.indexOf('/') === 0) {
        var match = base.match(/(\w*:\/\/)?[^\/]*\//) || [base];
        return match[0] + path.slice(1);
    }
    //relative to the current directory
    return base.replace(/\/[^\/]*$/, "") + '/' + path.replace(/^\.\//, '');
}

Test it on jsfiddle: https://jsfiddle.net/n11rg255/

It works both in the browser and in node.js or other environments.

lovasoa
  • 6,419
  • 1
  • 35
  • 45
2

I found this blog post that suggests using an image element instead of an anchor:

http://james.padolsey.com/javascript/getting-a-fully-qualified-url/

That works to reliably expand a URL, even in IE6. But the problem is that the browsers that I have tested will immediately download the resource upon setting the image src attribute - even if you set the src to null on the next line.

I am going to give bobince's solution a go instead.

Jesse Hallett
  • 1,857
  • 17
  • 26
0

If url does not begin with '/'

Take the current page's url, chop off everything past the last '/'; then append the relative url.

Else if url begins with '/'

Take the current page's url and chop off everything to the right of the single '/'; then append the url.

Else if url starts with # or ?

Take the current page's url and simply append url


Hope it works for you

geowa4
  • 40,390
  • 17
  • 88
  • 107
  • 2
    You forgot that URLs can begin with "//", which makes them scheme-relative. //foo.com/bar/ – Scott Wolchok Mar 07 '10 at 06:19
  • 1
    you also forgot the dotted relative ../../ syntax (whether this omission matters or no depends on what the output is required for) – hallvors Dec 05 '12 at 11:14
-1

If it runs in the browser, this sort of works for me..

  function resolveURL(url, base){
    if(/^https?:/.test(url))return url; // url is absolute
    // let's try a simple hack..
    var basea=document.createElement('a'), urla=document.createElement('a');
    basea.href=base, urla.href=url;
    urla.protocol=basea.protocol;// "inherit" the base's protocol and hostname
    if(!/^\/\//.test(url))urla.hostname=basea.hostname; //..hostname only if url is not protocol-relative  though
    if( /^\//.test(url) )return urla.href; // url starts with /, we're done
    var urlparts=url.split(/\//); // create arrays for the url and base directory paths
    var baseparts=basea.pathname.split(/\//); 
    if( ! /\/$/.test(base) )baseparts.pop(); // if base has a file name after last /, pop it off
    while( urlparts[0]=='..' ){baseparts.pop();urlparts.shift();} // remove .. parts from url and corresponding directory levels from base
    urla.pathname=baseparts.join('/')+'/'+urlparts.join('/');
    return urla.href;
  }
hallvors
  • 6,069
  • 1
  • 25
  • 43