I have a homebrew ETL solution. The transformation layer is defined in a configuration file in JavaScript scriptlets, interpreted by Java's Nashorn engine.
I am encountering performance issues. Perhaps there is nothing that can be done, but I hope someone can find an issue with the way I am using Nashorn that helps. The process is multi-threaded.
I create a single static ScriptEngine, which is only used to create CompiledScript objects.
private static ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");
I compile the scriptlets that will be re-executed on each record into CompiledScript objects.
public static CompiledScript compile(Reader reader) throws ScriptException {
return ((Compilable) engine).compile(reader);
}
There are two standard JavaScript libraries that are also compiled using this method.
For each record, a ScriptContext is created, the standard libraries are added, and the record's values are set as bindings.
public static ScriptContext getContext(List<CompiledScript> libs, Map<String, ? extends Object> variables) throws ScriptException {
SimpleScriptContext context = new SimpleScriptContext();
Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);
for (CompiledScript lib : libs) {
lib.eval(context);
}
for (Entry<String, ? extends Object> variable : variables.entrySet()) {
bindings.put("$" + variable.getKey(), variable.getValue());
}
return context;
}
The record's context is then used to transform the record and evaluate filters, all using the CompiledScripts.
public static String evalToString(CompiledScript script, ScriptContext context) throws ScriptException {
return script.eval(context).toString();
}
The actual execution of the CompiledScripts against a ScriptContext is very fast, however the initialization of the ScriptContexts is very slow. Unfortunately, at least as far as I understand it, this has to be done per set of bindings. If the record matches a filter, then I have to rebuild the context a second time for the same record, this time with some additional bindings from the matched filter.
It seems very inefficient to have to re-execute the two standard libraries any time I create a ScriptContext, however I have found no threadsafe way to clone a ScriptContext after these libraries have been executed but before the bindings have been added. It also seems very inefficient to have to re-execute the two standard libraries and reattach all bindings from the record if it matched a filter, but again I have found no threadsafe way to clone a record's ScriptContext to append another binding to it without also altering the original.
According to jvisualvm, the majority of my program's time is spent in
jdk.internal.dynalink.support.AbstractRelinkableCallSite.initialize() (70%)
jdk.internal.dynalink.ChainedCallSite.relinkInternal() (14%)
I would appreciate any insight into Nashorn that could help to increase performance for this use case. Thank you.