1

In a Chrome Extension, I need my background page to retrieve an HTML document and look though that document for text that appears within a specific anchor tag. The anchor tag does not have an ID attribute, but it is identifiable by its href content. For consistency with the rest of the code, I would like to handle retrieving this information through a document object created from the results of an XMLHttpRequest.

My problem is that when I call getElementsByTagName("a") on the DOM I have created, and then search through the href attributes of the resulting elements, only the tags with absolute URLs will return valid href values, whereas the tags with relative URLs return null href values. The anchor tag I need to find is one of those with a relative URL.

Here is the simplest form of the code that reproduces the error. Does anyone have any idea why this is happening or how to write a fix, preferably without abandoning DOM parsing?

function lookfor(linkContents, inURL) {
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function(data) {
        if (xhr.readyState == 4) {
            if (xhr.status == 200) {
                var doc = document.implementation.createHTMLDocument("");
                doc.documentElement.innerHTML = xhr.responseText;

                // Find Link in DOM of Document Created From HTTPRequest
                var found = -1;
                var links = doc.getElementsByTagName("a");
                console.log(links);
                for(var i = 0; i < links.length; i++) {
                    if (links[i].href) {
                        console.log(i + " " + links[i].href);
                        if (links[i].href.indexOf(linkContents) > -1) {
                            found = i;
                        }
                    }
                }
                if (found > -1) {
                    alert(links[found].innerHTML);
                }
            }
        }
    }
    xhr.open('GET', inURL, true);
    xhr.send();
}

[Update]

I was able to kludge over the problem for now with the following code, based on this answer: How do I do OuterHTML in firefox?

function getHref(anchor) {
    var href =
        ((new XMLSerializer().serializeToString(anchor) || "")
            .match(/href=("[^"'<>\s]+"|'[^"'<>\s]+'|[^"'<>\s]+)/i) || [""])[0]
                .replace(/(href=|'|")/ig, "")
    ;
    if (href != "") return href;
}

Interestingly enough, it won't work with type-checking. The same relative links that won't produce an href value, also don't have typeof 'Anchor'

Community
  • 1
  • 1
underemployedJD
  • 891
  • 6
  • 6

0 Answers0