0

I'm faced with a slight inconvenient 'lag' when I attempt to populate a div created in JavaScript:

var el = document.createElement("div");
el.innerHTML = '<insert string-HTML code here>'

However, this is natural due to extent of the HTML code; sometimes it's more than 300,000 characters long and it is derived from GM_xmlHttpRequest which sometimes takes 1000ms (give or take) to complete, plus the additional 500ms caused by the DOM-ification.

I have attempted to get rid of massive amount of text using substr (granted not the best idea that could've occurred to me), and it surprisingly worked for the most part, but at certain times element would fail to accept HTML code (probably unmatched <*.?>).

I only need to access an extremely small amount of text that's stored inside; regexp is per bobince out of the question and figured this would be the best approach.

EDIT: I'm inclined to mention that my definition of parsing the DOM has been underrated, I meant to say that this 'text' was the textContent of a quite a few elements which I modify. Therefore, regexp isn't an option.

Community
  • 1
  • 1
User2121315
  • 283
  • 4
  • 14

3 Answers3

3

While other ansers focus on guessing whether your desire (parsing DOM without string manipulation) makes sense, I will dedicate this answer to the comparison of reasonable DOM parsing methods.

For a fair comparison, I assume that we need the <body> element (as root container) for the parsed DOM. I have created a benchmark at http://jsperf.com/domparser-vs-innerhtml-vs-createhtmldocument.

var testString = '<body>' + Array(100001).join('<div>x</div>') + '</body>';

function test_innerHTML() {
    var b = document.createElement('body');
    b.innerHTML = testString;
    return b;
}
function test_createHTMLDocument() {
    var d = document.implementation.createHTMLDocument('');
    d.body.innerHTML = testString;
    return d.body;
}
function test_DOMParser() {
    return (new DOMParser).parseFromString(testString, 'text/html').body;
}

The first method is your current one. It is wel-supported accross all browsers.
Even though the second method has the overhead of creating a full document, it has a big benefit over the first one: resources (images) are not loaded. The overhead of the document is marginal compared to the potential network traffic of the first one.

The last method is -as of writing- only supported in Firefox 12+ (no problem, since you're writing a GreaseMonkey script), and is the specific tool for this job (with the same advantages of the previous method). As it name implies, it is a DOM parser.

The bench mark shows that the original method is the fastest 4.64 Ops/s, followed by the DOMParser method 4.22 Ops/s. The slowest method is the createHTMLDocument method 3.72 Ops/s. The differences are minimal though, so I definitely recommend the DOMParser for the reasons stated earlier.


I know that you're using GM_xmlhttprequest to fetch data. However, if you're able to use XMLHttpRequest instead, I suggest to give the following method a try: Instead of getting plain text as response, you can get a document as a response:

var xhr = new XMLHttpRequest();
xhr.open('GET', 'http://www.example.com/');
xhr.responseType = 'document';
xhr.onload = function() {
    var bodyElement = xhr.response.body; // xhr.response is a document object
};
xhr.send();

If Greasemonkey script is long active on a single page, you can still use this feature for other domains which do not support CORS: Insert an iframe in the document whose domain is equal to the other domain (eg http://example.com/favicon.ico), and use it as a proxy (activate the GM script for this page as well). The overhead of insering an iframe is significant, so this option is not viable for one-time requests.

For same-origin requests, this option may be the best one (although not benchmarked, one can argue that returning a document directly instead of intermediate string manipulation offers performance benefits). Unlike the DOMParser+text/html method, the responseType="document" is supported by more browsers: Chrome 18+, Firefox 11+ and IE 10+.

Rob W
  • 341,306
  • 83
  • 791
  • 678
  • Interesting, I show a 200ms difference between `DOMParser` and `createElement`. The first mentioned takes about +/-422ms and the other one +/-642ms. Ironically, the very tests hosted there suggest a 3.11 Ops/s on the very method I used originally. I assume it didn't account for images. – User2121315 Oct 07 '12 at 21:15
  • @User2121315 I have added another method to my answer. Might be interesting for you, depending on your situation. – Rob W Oct 07 '12 at 21:28
  • I would like to thank you for introducing me to `DOMParser`, it is certainly better than the method I've been using. As for the `XMLHttpRequest`, I'm afraid I'm bound by CORS but the iframe solution sounds very hackish and perhaps outweighs the benefit of those 450ms. – User2121315 Oct 07 '12 at 21:39
0

We'd need to know a bit more about your application, but when you're working with that much HTML content, you might just want to use an iframe. It's asynchronous, it won't stall JS code, and it won't introduce a plethora of potential debugging problems.

It can be dangerous to populate an element with raw HTML from an xmlhttprequest, mainly due to potential XSS vulnerabilities and next-to-impossible-to-fix HTML glitches. If at all possible, consider using a template (I believe JQuery offers some sort of templating solution) and loading a small amount of XML/JSON/etc. Only do that if using an iframe is out of the question though.

Jeffrey Sweeney
  • 5,986
  • 5
  • 24
  • 32
  • At this point, other parts of my code use the DOM-method to access data, and the code is working asynchronously parallel to another main application. – User2121315 Oct 07 '12 at 21:02
0

I you have a giant amount of HTML and it's taking a long time to put in the DOM and you only want a small piece of it, the ways to make that faster are:

  1. Get your server to serve up only the parts of the HTML you actually want. This would save on both the networking transfer time and the DOM parsing time.

  2. If you can't modify the server, then you need to manually parse some of the HTML to eliminate the parts you don't want so not as much will put in the DOM. A regex is one of the slower ways to search a giant string so it's better to use something like .indexOf() if possible to identify the general area you are targeting. If there is a unique id or class and you know the general form of the HTML, you can use a faster algorithm like that to identify the target area. But, without you disclosing the actual HTML to be parsed, we can't offer more specifics than that.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • #1: Sadly, I'm not able to use this method. #2: I forgot to mention that, even though the text is small, it involves less-than-complex-but-more-than-simple manipulations of DOM. – User2121315 Oct 07 '12 at 21:04
  • @User2121315 - so without showing us the actual HTML, how do you expect us to offer any specifics about how to trim it quickly? – jfriend00 Oct 07 '12 at 21:16