47

Context: I have a web application that processes and shows huge log files. They're usually only about 100k lines long, but it can be up to 4 million lines or more. To be able to scroll through that log file (both user initiated and via JavaScript) and filter the lines with decent performance I create a DOM element for each line as soon as the data arrives (in JSON via ajax). I found this better for performance then constructing the HTML at the back-end. Afterwards I save the elements in an array and I only show the lines that are visible.

For max 100k lines this takes only about a few seconds, but anything more takes up to one minute for 500k lines (not including the download). I wanted to improve the performance even more, so I tried using HTML5 Web Workers. The problem now is that I can't create elements in a Web Worker, not even outside the DOM. So I ended up doing only the json to HTML conversion in the Web Workers and send the result to the main thread. There it is created and stored in an array. Unfortunately this worsened the performance and now it takes at least 30 seconds more.

Question: So is there any way, that I'm not aware of, to create DOM elements, outside the DOM tree, in a Web Worker? If not, why not? It seems to me that this can't create concurrency problems, as creating the elements could happen in parallel without problems.

Joren Van Severen
  • 2,269
  • 2
  • 24
  • 30
  • Have you thinked of a solution, that requests the log-lines on demand? Parsing 4M lines of log-files at once is a heavy task and even if you could use the WebWorker in an effective way here, you won't get the performance-boost you are looking for. I'd recommend to only request a bunch of lines and process them similar to these infinite-scroll-pages. – Raoul Aug 05 '13 at 11:50
  • Yes, but that would make it very difficult to implement the filter-options. I would also have to store the JSON format, as parsing it to JSON in the back-end also takes a minute and there is no way to do that in pieces. It would also decrease the scrolling and filtering performance, which atm takes a few milliseconds or even nanoseconds. – Joren Van Severen Aug 05 '13 at 11:58

10 Answers10

18

Alright, I did some more research with the information @Bergi provided and found the following discussion on W3C mailing list:

http://w3-org.9356.n7.nabble.com/Limited-DOM-in-Web-Workers-td44284.html

And the excerpt that answers why there is no access to the XML parser or DOM parser in the Web Worker:

You're assuming that none of the DOM implementation code uses any sort of non-DOM objects, ever, or that if it does those objects are fully threadsafe. That's just not not the case, at least in Gecko.

The issue in this case is not the same DOM object being touched on multiple threads. The issue is two DOM objects on different threads both touching some global third object.

For example, the XML parser has to do some things that in Gecko can only be done on the main thread (DTD loading, offhand; there are a few others that I've seen before but don't recall offhand).

There is however also a workaround mentioned, which is using a third-party implementation of the parsers, of which jsdom is an example. With this you even have access to your own separate Document.

Joren Van Severen
  • 2,269
  • 2
  • 24
  • 30
  • 1
    The discussion on the mailing list is appears to be mere handwaving. Of course if the *current* version of Gecko can't do it, that's fine, But that doesn't mean that a *future* version should have this limitation as well. – Pacerier May 06 '14 at 13:42
  • There appears to be enough interest in it that it may someday become a possibility: http://www.2ality.com/2012/11/canvas-in-workers.html – Andy Oct 03 '15 at 19:06
  • Thats why virtual DOM is so great. U can have complex checking and rendering operations in another threads. – Pikachu Oct 31 '15 at 07:52
  • I wanted to build a huge HTML table in a worker to make the page more responsive, so `jsdom` looked promising -- there is even @types/jsdom. However, I hit lots of snags, including https://stackoverflow.com/questions/68592278/sharedarraybuffer-is-not-defined and finally just generated raw HTML in the worker. – Fuhrmanator Oct 01 '21 at 19:17
12

So is there any way, that I'm not aware of, to create DOM elements, outside the DOM tree, in a Web Worker?

No.

Why not? It seems to me that this can't create concurrency problems, as creating the elements could happen in parallel without problems.

Not for creating them, you're right. But for appending them to the main document - they would need to be sent to a different memory (like it's possible for blobs) so that they're inaccessible from the worker thereafter. However, there's absolutely no Document handling available in WebWorkers.

I create a DOM element for each line as soon as the data arrives (in JSON via ajax). Afterwards I save the elements in an array and I only show the lines that are visible.

Constructing over 500k DOM elements is the heavy task. Try to create DOM elements only for the lines that are visible. To improve performance and showing the first few lines faster, you also might chunk their processing into smaller units and use timeouts in between. See How to stop intense Javascript loop from freezing the browser

Community
  • 1
  • 1
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • One problem with creating the DOM elements on-the-fly is that atm I use elements to hold several data values, like the index. Because the index changes when the lines are filtered, I keep references to the element on places where that index is needed. I used to update it on those places too, but that took too long as they're a lot of them. If I have enough time left then I might try your way and see if I can't store refer to the indices (and other values) in another way. – Joren Van Severen Aug 07 '13 at 07:31
  • 1
    How about you keep all the lines as JavaScript Objects, and only keep as many DOM objects as there is space on the screen? When you scroll, you change the mapping DOM object -> JavaScript object, rather than change which DOM objects are visible. This will let you filter and scroll very quickly, and avoids creating too many DOM objects. – Jon Watte Nov 25 '14 at 23:01
  • @JonWatte: Yes, that's what my second suggestion means. And it is what high-performant table scroller libraries *do*. – Bergi Nov 26 '14 at 01:16
4

You have to understand the nature of a webworker. Programming with threads is hard, especially if you're sharing memory; weird things can happen. JavaScript is not equipped to deal with any kind of thread-like interleaving.

The approach of webworkers is that there is no shared memory. This obviously leads to the conclusion that you can't access the DOM.

Halcyon
  • 57,230
  • 10
  • 89
  • 128
  • 9
    I realize that, but for creating elements outside the DOM tree there would be no need for shared memory, no? The only thing that needs to be shared is the logic to create the elements, which, I guess, can be copied if necessary. – Joren Van Severen Aug 05 '13 at 12:01
  • You access the `DOM` through `document` which is a global variable, so .. no :( – Halcyon Aug 05 '13 at 12:02
  • 2
    @FritsvanCampen - You can access the DOM API through *any* document object, not just the one attached to the window object. So it doesn't have to be a global variable. – Alohci Aug 05 '13 at 12:36
  • @Alohci I understand that but how are you going to get a new document and how are you going to transfer the nodes? I doubt `importNode` will work in this. Also, making DOM Nodes isn't that expensive, just transfer some data back and do the DOM conversion in the parent script. – Halcyon Aug 05 '13 at 12:45
  • @FritsvanCampen you cannot access the nativ-dom-api implemented by the browser from a WebWorker, in this point you were right. However Alohci's approach was a more theoretical pointing to custom js-implementations, that is how i understood him. About tranfering the nodes between the WebWorker and the original document: This can only be done via the message-system and requires some kind of serialization. HTML is the obvious choice, but referring to the OP this technique didn't turn out so well. – Raoul Aug 05 '13 at 12:55
  • @FritsvanCampen - That's probably the thinking, yes. To make it worthwhile, the serialization/deserialization at the postMessage interface would have to specially optimized for DOM nodes. But I'm guessing that browsers could probably do that quite effectively, given the fact that we know the innerHTML is generally faster than constructing a DOM tree node by node, and that an optimized deserialization would have less work to do than innerHTML. – Alohci Aug 05 '13 at 12:55
  • @Alohci you're right innerHTML should perform better than the more expressive dom-manipulation-methods, see this benchmark for a [proof](http://www.quirksmode.org/dom/innerhtml.html). However i don't think that's the bottleneck here, since the OP statet that scrolling is not the problem, but parsing the json into dom seems to be very slow. – Raoul Aug 05 '13 at 13:00
  • By using document.createElement and innerHTML and reducing the garbage collection I got it down to 20-30 seconds for 500k lines (without Web Workers). – Joren Van Severen Aug 05 '13 at 15:30
  • @FritsvanCampen like many things creating DOM elements isn't that expensive no, but creating hundred thousands of them is. – Joren Van Severen Aug 05 '13 at 15:32
4

There is no direct way to access the DOM through Web Workers. I recently released @cycle/sandbox, it is still WIP, but it proves with the Cycle JS architecture it is fairly straight forward to declare UI behaviour in the Web Worker. The actual DOM is only touched in the main thread, but event listeners, and DOM updates are indirectly declared in the worker, and a synthesized event object is sent when something happens on those listeners. Furthermore it is straight forward to mount these sandboxed Cycle Components side-by-side regular Cycle Components.

http://github.com/aronallen/-cycle-sandbox/

Aron Allen
  • 41
  • 1
3

I don't see any reason why you can't construct html strings using web-workers. But I also don't think there would be much of a performance boost.

This isn't related to Web-Workers, but it relates to the problem you're trying to solve. Here are some thing that might help speed things up:

  1. Use DocumentFragments. Add elements to them as the data comes in, and add the fragments to the DOM at an interval (like once a second). This way you don't have to touch the DOM (and incur a redraw) every time a line of text is loaded.

  2. Do loading in the background, and only parse the lines as the user hits the bottom of the scroll area.

posit labs
  • 8,951
  • 4
  • 36
  • 66
  • The first advice about using a timed functions will have the opposite effect of what you might expecting. Browser don't redraw or render after every single DOM-manipulation, but whenever the call-stack becomes empty and before the next asynchronous task is fetched from the queue. That means that you will produce much more render-calls than needed when you split up the task in smaller asynchronous portions, that are all added to the queue. This relates to the javascript event-loop. – Raoul Aug 05 '13 at 18:06
  • 1
    Your second recommendation is commonly known as fetching data on-demand. That means fetch and process them only when they're needed. And imho. this would be the most reliable solution to this problem, however the OP already complained about other issues depending on this solution in the comments under his question. – Raoul Aug 05 '13 at 18:12
  • The way he is adding elements to the dom, he is incurring the maximum number of redraws. DocumentFragments can be created and manipulated off-screen. By skipping frames where redraws might have occurred, he saves some processor cycles. I'm not suggesting that he load the data on demand. I'm suggesting that he load the data in the background, and only parse it into dom elements as they are needed. As for filtering, I would recommend applying classes to elements, and setting their display according to the rules of the filter – posit labs Aug 05 '13 at 20:32
  • Ah okay, i got a little confused by the terms "Redrawing" and "Rerendering". Lets get some clarification in here: "Redrawing" refers to a process where the Browser is actually updating the screen, while "Rerendering" can happen in the background. For example if you made a change to the size of an element, that is currently attached to the document and visible, and then access the "offsetWidth" property, the Browser needs to perform "Rerendering", to determine the new computed property, however the change won't directly be visible on the screen. – Raoul Aug 05 '13 at 21:56
  • Document-fragments avoid this issue, because the offsetHeight property of an element within an fragment will always default to zero. So you're right you can save "Rerendering"-calls. But again i don't think this is bottle-neck of his algorithm, since there are only very few properties that requires a "Rerendering"-call, and they're all layout-concerned and shouldn't appear during transpiling from json to html anyway. – Raoul Aug 05 '13 at 22:04
  • 1
    Creating DOM elements does not incur redraws or rerendering, not until they're added to the actual DOM tree. Using DocumentFragments or appending the elements to another detached elemented is basically the same. – Joren Van Severen Aug 05 '13 at 22:04
  • Back to the on-demand-debate. I see the difference, so you are basically talking about fetching the data during initialisation, but leave the processing (~= transpiling) to when they're acutally needed. Seems nice to me and should avoid the issues of the OP mentioned above. I think caching of already processed lines could make a slight improvement to this solution. Sorry for the long answer. – Raoul Aug 05 '13 at 22:05
  • Btw, quoting myself here "Afterwards I save the elements in an array and I only show the lines that are visible.", by which I meant is when the user scrolls I add the corresponding lines, already parsed as elements, to the DOM, which is basically what you're suggesting. – Joren Van Severen Aug 05 '13 at 22:14
  • Now about the on-demand, I have thought about that a lot when I was implementing it and experimented several things. For example filtering like you suggests is to slow, the jquery selector for classes is extremely slow and needs all the elements to be in the DOM tree. The only way I found to make it extremely fast and still have all the elements out of the DOM tree, except the visible ones, is to keep an array of indices of the lines which aren't filtered. The be able to update this fast I also created arrays with the indices for each type, so I can easily substract or add them. – Joren Van Severen Aug 05 '13 at 22:18
  • What this filtering issue means is that I need to process all the records at once to do it the way I just explained. There are also other issues for which I need more information than just a few lines. In the end it would make the experience slower after the processing, which is not what I want. The user is prepared to wait some time during processing, but wants it to work fast after it, I'm just trying to minimize that processing time. – Joren Van Severen Aug 05 '13 at 22:22
2

You have a couple of anti-patterns in your design:

  1. Creating a DOM object has considerable overhead, and you are creating potentially millions of them at once.
  2. Trying to get a web worker to manage the DOM is exactly what web workers are not for. They do everything else so the DOM event loop stays responsive.

You can use a cursor pattern to scroll through arbitrarily large sets of data.

  1. DOM posts a message to worker with start position and number of lines requested (cursor).
  2. Web worker random accesses logs, posts back the fetched lines (cursor data).
  3. DOM updates an element with the async cursor response event.

This way, the heavy lifting is done by the worker, whose event loop is blocked during the fetch instead of the DOM, resulting in happy non-blocked users marvelling at how smooth all your animations are.

Dominic Cerisano
  • 3,522
  • 1
  • 31
  • 44
1

According to https://developer.mozilla.org/en-US/docs/Web/Guide/Performance/Using_web_workers there's no access to the DOM from a web worker unfortunately.

Strille
  • 5,741
  • 2
  • 26
  • 40
0

So you can't directly create DOM in a webworker - however, there may be another option to do a fair bit of your processing outside the main thread.

Check out this jsPerf I just created: http://jsperf.com/dom-construction-obj-vs-str

Essentially, you could be emitting POJSO's that have all the same values you get from a DOM, and convert it to DOM objects after receiving the message (this is what you're doing when you get HTML back, after all; POJSOs are just lower overhead, by virtue of not requiring further string processing). In this way you could even do things like emit event listeners and such (by, say, prefixing the event name with '!', and having the value map to some template-supplied view argument).

Meanwhile, without the DOM parser available, you'll need your own thing to convert a template as-needed, or to compile one to a format that's fast.

Fordi
  • 2,798
  • 25
  • 20
  • Are you sure that `factory.removeChild(ret);` does not affect your test? – Kuba Wyrostek Nov 03 '16 at 11:04
  • Doesn't seem to. JSPerf is a little broke these days, so I couldn't edit the old one: https://jsperf.com/dom-construction-3 – Fordi Nov 21 '16 at 20:41
  • Incidentally, even adding the step of parsing the JSON from a string doesn't make the Object->DOM case worse than the standard HTML parsing case. https://jsperf.com/dom-construction-4. – Fordi Nov 21 '16 at 20:48
0

No you can't create DOM elements in a web worker, but you can create a function that accepts the post message from that web worker, that does create the DOM elements. I think the deign that your looking for is called array chucking. And you would need to mix that with the web worker design pattern.

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
0

Update for 2022 (actually available in chrome since 2018):

If you are ok with displaying your logs in a canvas element, you could use the new OffscreenCanvas api.

The OffscreenCanvas interface provides a canvas that can be rendered off screen. It is available in both the window and worker contexts.

You could then asynchronously display frames produced in the Worker back to a canvas element on the main thread.

More examples here.

Erez Cohen
  • 1,507
  • 2
  • 16
  • 25