How to determine root cause of sluggish Nashorn performance or bottleneck with pre-compiled scripts

Question

I'm having some sluggish performance in Nashorn that I can't really explain the cause of. I'll detail what my setup is and how I've tried to debug it.

Hardware: Fairly decent server hardware ('13 era - 12 core Xeon, 2.1GHz). 64GB DDR3 RAM.

Software: Oracle JDK8 (latest 64-bit) (40GB RAM pre-allocated to the JVM).

My implementation is: Multiple Nashorn ScriptEngine instances, each with a pre-compiled "utility.js" that provides some helper functions that user-defined scripts can utilise.

I have a pool of ScriptEngine objects all ready to go with the utility.js already compiled against them and a thread allocator which will spin up threads up to a set limit. Each thread will grab a pre-allocated ScriptEngine and eval the user JS to it using a new context and execute it/store the result somewhere before returning the ScriptEngine to the pool. This is all working great and if my user script is fairly simple (single function) it's blindingly fast.

However, most user scripts are rather large and of the form:

function myFunc() {
    myFunc1();
    myFunc2();
    ... (you get the picture, they define and call a lot of functions!)
    myFunc100();
}

function myFunc1() {
 // do something simple here
}

When run in parallel, say with 25 threads at a time, and each with their own ScriptEngine (and all the pre-compiled things mentioned above) will take a very long time to execute whilst showing little CPU use (8-10% total) and no major blocking in jmc/jvisualvm. The threads will show that they've blocked a fair amount (count wise), but the slices are so minute I can never see them when clicking through the threads.

Most of the time when I click through the threads they're all showing that they're in MethodHandleNatives.setCallSiteTargetNormal.

I've tried a few things: 1. Single engine, different contexts. I could see blocking here between my threads even though it was all pre-compiled. The threads would wait (as they should) before calling individual snippets of bytecode from what I could tell. This isn't a viable solution.

Tried to inline a bunch of functions (most but not all) in the user scripts, this still didn't increase CPU usage and most threads were still in MethodHandleNatives.setCallSiteTargetNormal. Even inlined functions still seem to be heading to MethodHandleNatives.setCallSiteTargetNormal if I inspected the stack traces.

Here's how I create the ScriptEngines and pre-stuff them with the "utility.js" (the code where I stuff them into a pool has omitted to keep it short):

/**
 * Creates a PreCompiledScriptEngine which will contain a ScriptEngine + Pre-compiled utility.js
 */
private PreCompiledScriptEngine createScriptEngine() {
    String source = new Scanner(this.getClass().getClassLoader().getResourceAsStream(UTILITY_SCRIPT)).useDelimiter("\\Z").next();
    try {
        totalEngines.getAndAdd(1);
        ScriptEngine engine = new NashornScriptEngineFactory().getScriptEngine();
        return new PreCompiledScriptEngine(engine, ((Compilable) engine).compile(source));
    }
    catch (ScriptException e) {
        Logger.error(e);
    }
    return null;
}


/**
 * Small helper class to group a ScriptEngine and a CompiledScript (of utility.js) together
 */
public class PreCompiledScriptEngine {

    private ScriptEngine   scriptEngine;
    private CompiledScript compiledScript;


    PreCompiledScriptEngine(ScriptEngine scriptEngine, CompiledScript compiledScript) {
        this.scriptEngine = scriptEngine;
        this.compiledScript = compiledScript;
    }


    public ScriptEngine getScriptEngine() {
        return scriptEngine;
    }


    /**
     * This method will return the utility.js compiled runtime against our engine.
     *
     * @return CompiledScript version of utility.js
     */
    public CompiledScript getCompiledScript() {
        return compiledScript;
    }
}

And here's how I execute user specific JavaScript:

public Object executeUserScript(String script, String scriptFunction, Object[] parameters) {
    try {
        // Create a brand new context
        PreCompiledScriptEngine preCompiledScriptEngine = obtainFromMyScriptEnginePool();
        ScriptEngine engine = preCompiledScriptEngine.getScriptEngine();
        ScriptContext context = new SimpleScriptContext();
        context.setBindings(engine.createBindings(), ScriptContext.ENGINE_SCOPE);

        // Evaluate the pre-compiled utility.js in our new context
        preCompiledScriptEngine.getCompiledScript().eval(context);

        // Evaluate the specific user script in this context too
        engine.eval(script, context);
        //get the JS function the user wants to call
        JSObject jsObject = (JSObject) context.getAttribute(scriptFunction, ScriptContext.ENGINE_SCOPE);

        // Call the JS function with the parameters
        return jsObject.call(null, parameters);
    }
    catch (ScriptException e) {
        Logger.error("generated", e);
        throw new RuntimeException(e.getMessage());
    }
}

What I'm expecting out of this is that CPU usage would be 100% if my thread pool was exhausting the resources available on the machine and showing low performance, but instead I'm seeing low CPU and low performance :( I can't quite pin down where I'm going wrong here or why it's so slow without any obvious drain on resources.

One thing I just noticed while going to grab a stack trace from JVisualVM is that all my threads appear to be exhibiting this scenario: I allow the user defined Java Script to call a utility.js function which is essentially "execute another script", the stack traces all appear to be from this nested call to another script. In my setup it'd be using the same thread, and same engine from the thread again with a new context. I'd think this would be the same as before though and not require further compiling?

Related articles I've already looked at: What is the difference between anonymous and inline functions in JavaScript? and Nashorn inefficiency

Edit: Looking deeper into this, it's mostly when eval() occurs from inside a compiled script, but not all the time, something about specific cases must be making it unable to just be reinvoked directly without calling setTarget() which ends up taking more time.

The interesting thing is that the threads don't show that they're blocking when they're making these calls down to native JVM methods so it's hard to see where the time is being spent in every tool I've looked at.

How to determine root cause of sluggish Nashorn performance or bottleneck with pre-compiled scripts

0 Answers0