Optaplanner: Randomly very low "average calcultate count per second" when solving in parallel

Question

I am using Optaplanner to solve a comparatively small optimization problem. For my use case, many such optimizations are required though which is why I started to run them in parallel. The parallelity is based upon Java 8' parallel stream. It doesn't allow to control the actual number of threads to be used, but I believe it to be based on the available CPU count.

For most of the solver runs this seems to work fine, but I noticed that I sometimes got invalid solutions from a single run which were not reproducible when only that problem was run alone.

After inspecting the logs, I noticed that the "average calculate count per second" was very low for invalid solution while being fine for other runs. In fact, the invalid solution was actually the (naively built) initial solution:

[rkJoinPool.commonPool-worker-6] (DefaultSolver.java:203)       Solving started: time spent (0), best score (-5hard/-2medium/168soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-6] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-2medium/233soft).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-5hard/-1medium/579soft), environment mode (REPRODUCIBLE), random (JDK with seed 0). 
[rkJoinPool.commonPool-worker-4] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-1medium/617soft).
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-6hard/-3medium/137soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-7] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (42), time spent (704), best score (0hard/0medium/808soft).
[rkJoinPool.commonPool-worker-4] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (22), time spent (218), best score (0hard/0medium/1033soft). 
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:238)       Solving ended: time spent (210), best score (-6hard/-3medium/137soft), average calculate count per second (4), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-7] (DefaultSolver.java:238)       Solving ended: time spent (746), best score (0hard/0medium/808soft), average calculate count per second (25256), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:238)       Solving ended: time spent (219), best score (0hard/0medium/1033soft), average calculate count per second (30461), environment mode (REPRODUCIBLE).

Notice how Threads 4 and 7 produce good results with 25-30k accs, while Thread 5 produced an invalid result and only used 4 accs (given the 200ms termination timeout I assume that really only one step was taken).

The following configuration was used which was determined using the benchmarker (albeit in a single-thread setup):

<termination>
    <millisecondsSpentLimit>2000</millisecondsSpentLimit>
    <unimprovedMillisecondsSpentLimit>200</unimprovedMillisecondsSpentLimit>
</termination>
<constructionHeuristic>
    <constructionHeuristicType>FIRST_FIT</constructionHeuristicType>
</constructionHeuristic>
<localSearch>
    <localSearchType>HILL_CLIMBING</localSearchType>
</localSearch>

I assume that this problem has to do with the fact that several solvers are running in parallel while a time based termination criteria is used. Is the termination time based on "wall time" or on actual CPU time?

Is using a time based termination criteria not such a good idea when running in parallel? This seems to be the best way though to use all available computing power. What could cause as single solver to seemingly at random only perform so few steps?

What are you trying to do? Multi-bet solving? Hill climbing is normally inferior to Tabu Search, Late Acceptance and the like. — Geoffrey De Smet, Aug 17 '17 at 13:39
I am trying to build a roster of soccer players out of an available team. This is part of a simulation in which this is done over and over for various teams and matchups. I'll have to look into the mentioned meta heuristics again - I think they didn't work at all in the benchmarker. — geld0r, Aug 17 '17 at 17:59

score 1 · Accepted Answer · answered Aug 17 '17 at 13:39

1

millisecondsSpentLimit and unimprovedMillisecondsSpentLimit are based on wall time, not on actual CPU time.

AFAIK, parallel streams does not limit the number of threads to the number of CPU's, as those jobs might block under IO (which is not the case for Solver.solve() calls). I prefer to use an ExecutorService with a thread pool size of Math.max(1, Runtime.getRuntime().availableProcessors() - 2).

answered Aug 17 '17 at 13:39

Geoffrey De Smet

26,223
11
73
120

I'll try this out. To be honest, I only used the stream based parallelity because it was the easiest to implement. As an alternative, would using a step based termination criteria work better, like described here? https://docs.optaplanner.org/7.1.0.Final/optaplanner-docs/html_single/index.html#stepCountTermination – geld0r Aug 17 '17 at 18:01
yes it would. Although scoreCalucationLimit is probably fairer between TS and LA (because TS is slow stepping and LA is fast stepping). – Geoffrey De Smet Aug 18 '17 at 10:18
1

I reran (and extended) my benchmarks, especially focusing on finding a good configuration for each of the meta heuristics. In the end it turned out that Tabu Search provided more reliable results for my problem (even though Hill Climbing and the others weren't much worse). I ended up not chosing a scoreCalculationLimit but only a unimprovedMillisecondsLimit of 500ms. With that configuration I didn't see the problem again. On the topic of stream & number of threads, I followed the suggestions here: https://stackoverflow.com/a/22269778/2574340 – geld0r Aug 21 '17 at 17:08

Optaplanner: Randomly very low "average calcultate count per second" when solving in parallel

1 Answers1