4

I have a homebrew ETL solution. The transformation layer is defined in a configuration file in JavaScript scriptlets, interpreted by Java's Nashorn engine.

I am encountering performance issues. Perhaps there is nothing that can be done, but I hope someone can find an issue with the way I am using Nashorn that helps. The process is multi-threaded.

I create a single static ScriptEngine, which is only used to create CompiledScript objects.

private static ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");

I compile the scriptlets that will be re-executed on each record into CompiledScript objects.

public static CompiledScript compile(Reader reader) throws ScriptException {
    return ((Compilable) engine).compile(reader);
}

There are two standard JavaScript libraries that are also compiled using this method.

For each record, a ScriptContext is created, the standard libraries are added, and the record's values are set as bindings.

public static ScriptContext getContext(List<CompiledScript> libs, Map<String, ? extends Object> variables) throws ScriptException {    
    SimpleScriptContext context = new SimpleScriptContext();
    Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);

    for (CompiledScript lib : libs) {
        lib.eval(context);
    }

    for (Entry<String, ? extends Object> variable : variables.entrySet()) {
        bindings.put("$" + variable.getKey(), variable.getValue());
    }
    return context;
}

The record's context is then used to transform the record and evaluate filters, all using the CompiledScripts.

public static String evalToString(CompiledScript script, ScriptContext context) throws ScriptException {
    return script.eval(context).toString();
}

The actual execution of the CompiledScripts against a ScriptContext is very fast, however the initialization of the ScriptContexts is very slow. Unfortunately, at least as far as I understand it, this has to be done per set of bindings. If the record matches a filter, then I have to rebuild the context a second time for the same record, this time with some additional bindings from the matched filter.

It seems very inefficient to have to re-execute the two standard libraries any time I create a ScriptContext, however I have found no threadsafe way to clone a ScriptContext after these libraries have been executed but before the bindings have been added. It also seems very inefficient to have to re-execute the two standard libraries and reattach all bindings from the record if it matched a filter, but again I have found no threadsafe way to clone a record's ScriptContext to append another binding to it without also altering the original.

According to jvisualvm, the majority of my program's time is spent in

jdk.internal.dynalink.support.AbstractRelinkableCallSite.initialize() (70%)
jdk.internal.dynalink.ChainedCallSite.relinkInternal() (14%)

I would appreciate any insight into Nashorn that could help to increase performance for this use case. Thank you.

Aaron
  • 414
  • 4
  • 16

1 Answers1

2

I was able to succeed using ThreadLocal to avoid cross-talk. This runs 1,000,000 tests to watch for cross-talk, and finds none. This change means I create ~4 ScriptContext objects instead of about 8,000,000.

package com.foo;

import java.util.UUID;
import java.util.stream.Stream;

import javax.script.Bindings;
import javax.script.Compilable;
import javax.script.CompiledScript;
import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import javax.script.SimpleScriptContext;

public class Bar {

    private static ScriptEngine engine;
    private static CompiledScript lib;
    private static CompiledScript script;

    // Use ThreadLocal context to avoid cross-talk
    private static ThreadLocal<ScriptContext> context;

    static {
        try {
            engine = new ScriptEngineManager().getEngineByName("JavaScript");
            lib = ((Compilable) engine)
                    .compile("var firstChar = function(value) {return value.charAt(0);};");
            script = ((Compilable) engine).compile("firstChar(myVar)");
            context = ThreadLocal.withInitial(() -> initContext(lib));
        } catch (ScriptException e) {
            e.printStackTrace();
        }
    }

    // A function to initialize a ScriptContext with a base library
    private static ScriptContext initContext(CompiledScript lib) {
        ScriptContext context = new SimpleScriptContext();
        try {
            lib.eval(context);
        } catch (ScriptException e) {
            e.printStackTrace();
        }
        return context;
    }

    // A function to set the variable binding, evaluate the script, and catch
    // the exception inside a lambda
    private static String runScript(CompiledScript script,
            ScriptContext context, String uuid) {
        Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);
        bindings.put("myVar", uuid);
        String result = null;
        try {
            result = ((String) script.eval(context));
        } catch (ScriptException e) {
            e.printStackTrace();
        }
        return result;
    }

    // The driver function which generates a UUID, uses Nashorn to get the 1st
    // char, uses Java to get the 1st char, compares them and prints mismatches.
    // Theoretically if there was cross-talk, the variable binding might change
    // between the evaluation of the CompiledScript and the java charAt.
    public static void main(String[] args) {
        Stream.generate(UUID::randomUUID)
                .map(uuid -> uuid.toString())
                .limit(1000000)
                .parallel()
                .map(uuid -> runScript(script, context.get(), uuid)
                        + uuid.charAt(0))
                .filter(s -> !s.substring(0, 1).equals(s.substring(1, 2)))
                .forEach(System.out::println);
    }

}
Aaron
  • 414
  • 4
  • 16