Set origin when parsing HTML document

Question

My JavaScript app retrieves a webpage by XHR then parses it like this:

        var el = document.createElement( 'html' );
        el.innerHTML = xml;

        var links = el.getElementsByTagName( 'a' );

In the process, the links' href tags get reinterpreted as relative to this document, so I get links like http://localhost:8000/download.zip.

I tried hacking my way around it:

if (link.origin === document.origin) {
    link.href = link.href.replace(link.origin, h.url.replace(/\/$/, ''));
}

But that can't distinguish between foo.org/bar (foo.org/bar/download.zip) and foo.org/bar.php (foo.org/download.zip), and I don't really want to go down the rabbit hole of working out exactly what substitutions to perform.

I tried injecting either a <base href=...> or <xml:base=xxx> into the document, but that didn't work.

What am I missing? This seems like a common enough need?

I'm not using any jQuery or anything similar (and can't.)

Is this what you're looking for? http://stackoverflow.com/questions/1550901/how-to-get-raw-href-contents-in-javascript — light, Jul 22 '15 at 16:48
Huh....that's very interesting. (Accessing `getAttribute('href')` instead of `.href`). That still doesn't quite solve the problem of resolving the relative link correctly for a path like `.../foo.php`. — Steve Bennett, Jul 22 '15 at 17:00

Set origin when parsing HTML document

0 Answers0