3

The problem

I have developed an extension that intercepts web requests, gets the HTML the web request originated from and processes it. I have used the DOMParser to parse the HTML and I have realised that the DOMParser is causing massive memory leak issues, which eventually causes the chrome extension to crash.

This is the code that causes the issues. https://gist.github.com/uche1/20929b6ece7d647250828c63e4a2ffd4

What I've tried

Dev Tools Recorded Performance

I have recorded the chrome extension whilst it's intercepting requests and I noticed that as the DOMParser.parseFromString method was called, the more nodes and documents were created which weren't destroyed.

Dev tools screenshot https://i.stack.imgur.com/C8mVi.png

Task Manager Memory Footprint

I looked at the task manager on chrome and saw that it had a huge memory footprint that wouldn't decrease with time (because garbage collection should kick in after a while). When the memory footprint gets too large the extension crashes.

Task manager memory footprint screenshot https://i.stack.imgur.com/F3f3k.png

Heap snapshots

I took some before and after screenshots of the heap and I can see the issue seems to be originating from the HTMLDocuments being allocated that isn't being garbage collected.

Snapshot (before) https://i.stack.imgur.com/17LYh.png

Snapshot (after) https://i.stack.imgur.com/Jtg1X.png

Expected outcome

I would want to understand why the DOMParser is causing such memory issues, why it isn't being cleaned up by the garbage collector and what to do to resolve it.

Thanks

Community
  • 1
  • 1
Coder Guy
  • 185
  • 5
  • 12

3 Answers3

2

I have resolved the problem. It seems like the issue was because the DOMParser class for some reason kept the references of the HTML documents it parsed in memory and didn't release it. Because my extension is a Chrome extension that runs in the background, exaggerates this problem.

The solution was to use another method of parsing the HTML document which was to use

let parseHtml = (html) => {
    let template = document.createElement('template');
    template.innerHTML = html;
    return template; 
}

This helped resolve the issue.

Coder Guy
  • 185
  • 5
  • 12
0

You are basically replicating the entire DOM in memory and then never releasing the memory.

We get away with this in a client side app because when we navigate away, the memory used by the scripts on that page is recovered.

In a background script, that doesn't happen and is now your responsibility.

So set both parser and document to null when you are done using it.

chrome.webRequest.onCompleted.addListener(async request => {
    if (request.tabId !== -1) {
        let html = await getHtmlForTab(request.tabId);
        let parser = new DOMParser();
        let document = parser.parseFromString(html, "text/html");
        let title = document.querySelector("title").textContent;
        console.log(title);
        parser = null; // <----- DO THIS
        document = null; // <----- DO THIS
    }
}, requestFilter);
Randy Casburn
  • 13,840
  • 1
  • 16
  • 31
  • That was the first thing I tried when I suspected the document variable to be the issue, but sadly that doesn't make a difference. – Coder Guy Jun 04 '19 at 21:53
  • Did you try both? `parser` & `document` – Randy Casburn Jun 04 '19 at 21:54
  • I have just tried setting both to null and still the same thing. The memory does not get free'd – Coder Guy Jun 04 '19 at 22:04
  • Does memory allocation show this listener creating that memeory? Does the heap snapshot show this DOM being detached? – Randy Casburn Jun 04 '19 at 22:07
  • Yes, it does, looking at the heap snapshot, I can see the DOM elements from each parsed page still there. I do not know if they're shown as being detected. I do not know how to tell. – Coder Guy Jun 04 '19 at 22:12
  • Does the title print? – Randy Casburn Jun 04 '19 at 22:26
  • Yes, the title still prints and my updated code is exactly like the one you suggested. Also, I saw your edited comment, the webrequest.onCompleted only runs on xhr and main document requests as defined in the requestFilter variable – Coder Guy Jun 04 '19 at 22:32
  • No problem, this problem has been bugging me for weeks. It took me ages to narrow down to the DOMParser causing the issue. Do you have any ideas as to why this may be the case? – Coder Guy Jun 04 '19 at 22:39
  • I do. Every time a document is created (even using DOMParser), the `.createHTMLDocument()` function is used to accomplish the DOM construction in the browser. That method attaches the new DOM to the root DOM. Since that reference is created (and we can't see it), then there is still a reference to it - and it won't get GC'd. As an experiment, just prior to marking `parser` null, set the root `document.body`s (the tab's) `innerHTML` to an empty space. Let's see what happens then. – Randy Casburn Jun 04 '19 at 22:48
  • And that just turned the light on. The `title` variable is a reference to your new DOM. It will prevent GC too. So as a final (and I'm betting last) experiment, set `title` to null as well. – Randy Casburn Jun 04 '19 at 23:03
  • I set title to null, and document.body.innerHTML = '' and still nothing changes in regards to memory footprint – Coder Guy Jun 04 '19 at 23:06
  • I have discovered something by debugging, but even after setting document = null, it sitll doesn't become null. The instance is still there. I found this out by setting a breakpoint in the debugger and inspecting the variable. The other variables like title, parser were null, but document wasn't/. – Coder Guy Jun 04 '19 at 23:13
  • not `document.body.innerHTML` that is your document. I meant set the Tab's body innerHTML to an empty string. – Randy Casburn Jun 04 '19 at 23:14
  • then let's refrain from using a reserved word :-) - change the variable name – Randy Casburn Jun 04 '19 at 23:14
  • I have done that. This is the screenshot of the memory profile of the original code https://i.imgur.com/FJm6p5Y.png. This is a screenshot after setting some variables to null https://i.imgur.com/I86fRiK.png. This shows that some nodes are being cleaned up, but after a while they stay the same – Coder Guy Jun 04 '19 at 23:44
0

I cannot point to a confirmed bug report in Chromium, but we were also hit by the memory leak. If you are developing an extension, DOMParser will leak in background scripts on Chromium based browser, but not on Firefox.

We could not get any of the workarounds mentioned here to solve the leak, so we ended up replacing the native DOMParser with the linkedom library, which provides a drop-in replacement and works in the browser (not only in NodeJs). It solves the leaks, so you might consider it, but there are some aspects that you need to be aware of:

  • It will not leak, but its initial memory footprint is higher then using the native parser
  • Performance is most likely slower (but I have not benchmarked it)
  • The DOM generated by its HTML parser might slightly different from what Firefox or Chrome would produce. The effect is most visible in HTML that is broken and where the browsers will attempt to error correct it.

We also tried jsdom first, which tries to be more compatible with the majors browsers at the cost of higher complexity of its codebase. Unfortunately, we found it difficult to make jsdom work in the browser (but on NodeJs it is works well).

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239