From whatever you have described here is approach you must try in multiple thread concept.
You have to create thread which will accept your model and test dataset.
EvaluateThread t1 = new EvaluateThread(threadName,model,testDataset1);
EvaluateThread t2 = new EvaluateThread(threadName,model,testDataset2);
EvaluateThread t3 = new EvaluateThread(threadName,model,testDataset3);
Then create synchronized method so that each thread can access that method independently.
Something like this
public synchronized double calculateError(model, dataset){
// do your stuff for e.g. calculate error
return error;
}
Finally calculate average of error you get from each thread.
For more info about synchronized method check this link.