8

So I am grabbing RSS feeds via AJAX. After processing them, I have a html string that I want to manipulate using various jQuery functionality. In order to do this, I need a tree of DOM nodes.

I can parse a HTML string into the jQuery() function.
I can add it as innerHTML to some hidden node and use that.
I have even tried using mozilla's nonstandard range.createContextualFragment().

The problem with all of these solutions is that when my HTML snippet has an <img> tag, firefox dutifully fetches whatever image is referenced. Since this processing is background stuff that isn't being displayed to the user, I'd like to just get a DOM tree without the browser loading all the images contained in it.

Is this possible with javascript? I don't mind if it's mozilla-only, as I'm already using javascript 1.7 features (which seem to be mozilla-only for now)

gfxmonk
  • 8,614
  • 5
  • 42
  • 53

3 Answers3

3

The obvious answer is to parse the string and remove the src attributes from img tags (and similar for other external resources you don't want to load). But you'll have already thought of that and I'm sure you're looking for something less troublesome. I'm also assuming you've already tried removing the src attribute after having jquery parse the string but before appending it to the document, and found that the images are still being requested.

I'm not coming up with anything else, but you may not need to do full parsing; this replacement should do it in Firefox with some caveats:

thestring = thestring.replace("<img ", "<img src='' ");

The caveats:

  • This appears to work in the current Firefox. That doesn't meant that subsequent versions won't choose to handle duplicated src attributes differently.
  • This assumes the literal string "general purpose assumption, that string could appear in an attribute value on a sufficiently...interesting...page, especially in an inline onclick handler like this: <a href='#' onclick='$("frog").html("<img src=\"spinner.gif\">")'> (Although in that example, the false positive replacement is harmless.)

This is obviously a hack, but in a limited environment with reasonably well-known data...

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • @T. J. - You're right, works in every browser except firefox, seeing if there's another way. Also to make yours more robust, I'd suggest just `src=` replaced with `blah=`, this would eliminate javascript fetches too. – Nick Craver Feb 20 '10 at 12:35
  • @Nick: The parse-then-remove works except in FF? Heh. Classic, everything but the one browser the OP wanted to use. :-) I didn't try to muck about with the `src=` because it makes the replacement *much* more complicated, have to be sure that it's appearing inside a tag, etc., etc. – T.J. Crowder Feb 20 '10 at 12:38
  • @T.J. no no, my solution worked everywhere except FF which is why I didn't see, but yes same irony :) – Nick Craver Feb 20 '10 at 12:40
  • cheers :) I've ended up modifying src= to _src=, since I want to (at some point) reverse the process and get the image urls back. And given that I'm reversing it before it is eventually displayed, the false positives should be negligible. – gfxmonk Feb 20 '10 at 23:56
3

You can use the DOM parser to manipulate the nodes. Just replace the src attributes, store their original values and add them back later on.

Sample:

    (function () {
        var s = "<img src='http://www.google.com/logos/olympics10-skijump-hp.png' /><img src='http://www.google.com/logos/olympics10-skijump-hp.png' />";
        var parser = new DOMParser();
        var dom = parser.parseFromString("<div id='mydiv' >" + s + "</div>", "text/xml");
        var imgs = dom.getElementsByTagName("img");
        var stored = [];
        for (var i = 0; i < imgs.length; i++) {
            var img = imgs[i];
            stored.push(img.getAttribute("src"));
            img.setAttribute("myindex", i);
            img.setAttribute("src", null);
        }
        $(document.body).append(new XMLSerializer().serializeToString(dom));
        alert("Images appended");
        window.setTimeout(function () {
            alert("loading images");
            $("#mydiv img").each(function () {
                this.src = stored[$(this).attr("myindex")];
            })
            alert("images loaded");
        }, 2000);
    })();
Andras Vass
  • 11,478
  • 1
  • 37
  • 49
  • Thanks, that's a great answer. The only problem (for my case) is that it only supports valid XML, which is probably not going to work for arbitrary RSS feed content (how I wish it would). But for others if you can ensure valid XML, you ought to use this ;) – gfxmonk Feb 20 '10 at 23:53
  • "It is very easy to parse RSS feeds with Javascript, since RSS feeds are just plain XML." From "Parsing RSS feeds with AJAX/Javascript": http://www.captain.at/howto-ajax-parse-rss.php :-) – Andras Vass Feb 21 '10 at 00:31
  • yes, the RSS feed is valid XML. However the entry contents is just CDATA containing whatever mish-mash of HTML the author published as the "contents" of the entry. That is (sadly) the part I wish to parse. – gfxmonk Feb 22 '10 at 11:25
3

The answer is this:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(htmlString, "text/html");
var jdoc = $(htmlDoc);
console.log(jdoc.find('img'));

If you pay attention to your web requests you'll notice that none are made even though the html string is parsed and wrapped by jquery.

argyle
  • 1,319
  • 2
  • 14
  • 28
  • 1
    @gfxmonk: The problem with this is that parsing HTML with `DOMParser` isn't supported in anything prior to IE10, and not supported in Safari at all. IE10 finally adds HTML parsing, so if the Safari folks would get on board, it might be viable in a couple of years. But if your target browsers don't include Safari or IE8 or IE9, it works. jeromeyers - When there are *significant* support issues, always best to mention that in the answer. – T.J. Crowder Aug 02 '14 at 07:39
  • @T.J.Crowder Thanks for the heads up. I was unaware of the browser compatibility issue. So, a cross-browser extension for jquery would look something like your answer for older browsers and safari and my answer for newer browsers? – argyle Aug 23 '14 at 18:01