5

I have a situation that's been torturing me for months: I keep getting OOM exceptions (Heap Space) and on inspecting heap dumps I've found millions of instances of objects I never allocated but that were likely allocated in underlying libraries. After much blood, sweat and tears I have managed to localize the code generating the memory leak and I have composed a minimal, complete and verifiable code sample to illustrate this:

import java.util.logging.Level;
import java.util.logging.Logger;
import javafx.application.Application;
import javafx.beans.value.ChangeListener;
import javafx.beans.value.ObservableValue;
import javafx.concurrent.Worker;
import javafx.scene.web.WebEngine;
import javafx.stage.Stage;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class MVC extends Application implements ChangeListener<Worker.State>{

    private final WebEngine engine = new WebEngine();
    private final String url = "https://biblio.ugent.be/publication?sort=publicationstatus.desc&sort=year.desc&limit=250&start=197000";
    private final XPath x = XPathFactory.newInstance().newXPath();

    @Override
    public void start(Stage primaryStage) throws Exception {
        System.setProperty("jsse.enableSNIExtension", "false");
        engine.getLoadWorker().stateProperty().addListener(this);
        engine.load(url);
    }

    public static void main(String[] args) {
        launch(args);
    }

    private NodeList eval(Node context, String xpath) throws XPathExpressionException{
        return (NodeList)x.evaluate(xpath, context, XPathConstants.NODESET);
    }

    @Override
    public void changed(ObservableValue<? extends Worker.State> observable, Worker.State oldValue, Worker.State newValue) {
        if (newValue==Worker.State.SUCCEEDED) {
            try {
                while(true){
                    NodeList eval = eval(engine.getDocument(), "//span[@class='title']");
                    int s = eval.getLength();
                }
            } catch (XPathExpressionException ex) {
                Logger.getLogger(MVC.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }
}

The code does the following:

  • load a document using the JavaFX WebEngine.
  • endlessly perform an xpath query on the document using the javax.xml packages, without storing the result or pointers to it.

To run, create a JavaFX application, add a file named MVC.java in the default package, enter the code and hit run. Any profiling tool (I use VisualVM) should quickly show you that in a matter of minutes, the heap grows uncontrollably. The following objects seem to be allocated but never released:

  • java.util.HashMap$Node
  • com.sun.webkit.Disposer$WeakDisposerRecord
  • com.sun.webkit.dom.NamedNodeMapImpl$SelfDisposer
  • java.util.concurrent.LinkedBlockingQueue$Node

This behavior happens every time I run the code, regardless of the url I load or the xpath I execute on the document.

Setup with which I tested:

  • MBP running OS X Yosemite (up-to-date)
  • JDK 1.8.0_60

Can anyone reproduce this issue? Is it an actual memory leak? Is there anything I can do?

edit

A colleague of mine reproduced the problem on a w7 machine with JDK 1.8.0_45, and it happens on an Ubuntu server as well.

edit 2

I've tested jaxen as an alternative to the javax.xml package, but the results are the same, which leads me to believe the bug lies deep within the sun webkit

RDM
  • 4,986
  • 4
  • 34
  • 43
  • possibly related: http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance – RDM Sep 17 '15 at 16:21
  • 2
    I can reproduce this on Windows 7 64-bit, Java 1.8.0_60. It does appear to be a memory leak. I tried doing the same loop on an arbitrary XML file without involving JavaFX, and got the same result. – VGR Sep 17 '15 at 21:21
  • Thanks for looking into this! I hadn't even considered not using javafx but you're completely right, the bug lies deeper, and the way the w3c document is provided is not important. – RDM Sep 17 '15 at 22:56
  • If you want to eliminate some potential weak spots such as the potential DOM serialization you could use HTML Cleaner. I was experiencing a rather brisk class loading and memory leak due to DOM serialization before switching to this tool. Requests per second were around 70 using Apache Async Http Client pool but I also use FX. – Andrew Scott Evans Jun 29 '16 at 17:02

1 Answers1

7

I reproduced leak with jdk1.8.60 in Ubuntu too. I did quite some profiling and debugging and the core cause is simple and it can be fixed easily. No memory leak in the XPath stuff.

There is a class com.sun.webkit.Disposer, which is doing continuous cleanup of some internal structures that get created during the XPath evaluation. The disposer internaly calls the cleanup via Invoker.getInvoker().invokeOnEventThread(this);. You can see it if you decompile the code. There are different implementations of the invoker, using different threads. If you work within JavaFX, the Invoker performs the cleanup periodically in the JavaFX thread.

However, your changed listener method is also called in the JavaFX thread, and it never returns, so the cleanup has never a chance to occur.

I modified your code, so that the changed method only spawns a new thread and returns, and the processing is done asynchronously. And guess what - the memory does not grow any more:

@Override
public void changed(ObservableValue<? extends Worker.State> observable, Worker.State oldValue, Worker.State newValue) {
    if (newValue==Worker.State.SUCCEEDED) {
        new Thread(() ->{
            try {
                while(true){
                    NodeList eval = eval(engine.getDocument(), "//span[@class='title']");
                    int s = eval.getLength();
                }
            } catch (XPathExpressionException ex) {
                Logger.getLogger(MVC.class.getName()).log(Level.SEVERE, null, ex);
            }
        }).start();
    }
}
Jan X Marek
  • 2,464
  • 2
  • 18
  • 26
  • Very well spotted. I was able to apply this principle in my main project as well (as stated, the code in the question was just an MVC) and now the memory does no longer grow. In my main project, I didn't do anything as silly as block the event queue forever - or at least not purposefully, but apparently I did so accidentally (it's a big nest of asynchronous callbacks), but simply introducing a new thread where the most crunching happens freed up the event thread to run the disposers, keeping memory in check. Thanks and +rep. – RDM Sep 22 '15 at 08:43