2

I'm new to JHM, to test it out I wrote the following simple method

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public long sumPar(){
    return 
            LongStream.rangeClosed(1, LIMIT)
                      .parallel()
                      .sum();
}

The results of the test varied greatly

# JMH 1.10.3 (released 8 days ago)
# VM version: JDK 1.8.0_45, VM 25.45-b02
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: testStreams.Bench.sumPar

# Run progress: 22.21% complete, ETA 00:02:33
# Fork: 1 of 1
# Warmup Iteration   1: 53.495 ops/s
# Warmup Iteration   2: 57.971 ops/s
# Warmup Iteration   3: 57.622 ops/s
# Warmup Iteration   4: 58.113 ops/s
# Warmup Iteration   5: 57.861 ops/s
Iteration   1: 50.414 ops/s
Iteration   2: 9.207 ops/s
Iteration   3: 9.176 ops/s
Iteration   4: 9.212 ops/s
Iteration   5: 9.175 ops/s

The same behaviour was observed after multiple runs. After reducing the warm up iterations I still saw a drop in ops/s after 5/6 iterations. Other Benchmarks in the test suite which used parallel operations performed consistently.

I'm new to JMH, I have a few questions.

  • Are there any configuration parameters in JMH I could tune to get less variance?
  • Is this the expected behaviour of multiple iterations of the method in question?
  • If JHM is reporting correctly how might I go about debugging this method and finding out why the ops/s drop?

Some specs

intel i7-4810MQ @ 2.80GHz
16GB RAM
Ubuntu 14.04 running on virtual box on windows host
virtual machine is alloacted 3 CPU cores
JDK 1.8.0_45
JMH 1.10.3

EDIT

Thanks for all the feedback, much appreciated.

I re-ran the benchmark this morning having left my laptop off all night. The inconsistent behaviour has completely disappeared. I upped the iterations to 1000 and still the ops/s remained consistent.

Checking CPU temp it was stable at 84 degrees.

I have not been able to reproduce this issue, next time I think my CPU may be overheating I'd like to re-run this benchmark and monitor CPU temp to see if this behaviour re-occurs.

Using -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining when running did show similar compilation patterns to the method in Erratic performance of Arrays.stream().map().sum(), however I do not think that was the root cause of this issue.


EDIT 2

Was able to reproduce and solved by adding -XX:MaxInlineLevel=12 flag.

DevBox:~/test$ java -jar target/benchmarks.jar testStreams.Bench -i 5 -wi 5 -f 1
# JMH 1.10.3 (released 8 days ago)
# VM version: JDK 1.8.0_45, VM 25.45-b02
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: testStreams.Bench.sumPar

# Run progress: 0.00% complete, ETA 00:00:30
# Fork: 1 of 1
# Warmup Iteration   1: 53.672 ops/s
# Warmup Iteration   2: 57.720 ops/s
# Warmup Iteration   3: 58.320 ops/s
# Warmup Iteration   4: 58.174 ops/s
# Warmup Iteration   5: 58.680 ops/s
Iteration   1: 49.810 ops/s
Iteration   2: 9.109 ops/s
Iteration   3: 9.427 ops/s
Iteration   4: 9.437 ops/s
Iteration   5: 9.436 ops/s

DevBox:~/test$ java -XX:MaxInlineLevel=12 -jar target/benchmarks.jar testStreams.Bench -i 1000 -wi 5 -f 1
# JMH 1.10.3 (released 8 days ago)
# VM version: JDK 1.8.0_45, VM 25.45-b02
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: -XX:MaxInlineLevel=12
# Warmup: 5 iterations, 1 s each
# Measurement: 1000 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: testStreams.Bench.sumPar

# Run progress: 0.00% complete, ETA 00:50:16
# Fork: 1 of 1
# Warmup Iteration   1: 53.888 ops/s
# Warmup Iteration   2: 58.328 ops/s
# Warmup Iteration   3: 58.468 ops/s
# Warmup Iteration   4: 58.455 ops/s
# Warmup Iteration   5: 57.937 ops/s
Iteration   1: 58.717 ops/s
Iteration   2: 59.494 ops/s
Iteration   3: 60.013 ops/s
Iteration   4: 59.506 ops/s
Iteration   5: 51.543 ops/s

Still haven't found root cause as to why I couldn't reproduce the issue when -XX:MaxInlineLevel=12 was omitted before. As far as I can see I was using the same settings. I may try run benchmark the again after my laptop has been left idle for a while, but for now am happy I gained some insight into JIT inlining.

Community
  • 1
  • 1
Mantis
  • 112
  • 1
  • 8

2 Answers2

3

I have all but confirmed that you are experiencing exactly the same behavior as described in this post: Erratic performance of Arrays.stream().map().sum()

If you run your test long enough (1000 iterations), you'll see that at some point performance is restored. The reason is an awkward inlining decision by the JIT compiler, which tears apart the hot loop code. Refer to the linked post for excellent coverage by Paul Sandoz.

Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • 1
    For me turning off tiered compilation via `@Fork(jvmArgsAppend= {"-XX:-TieredCompilation"})` seems to help in this case. As does bumping the inling depth as suggested in the linked answer. – the8472 Jul 25 '15 at 09:19
  • 1
    This pretty much nails these two as the same issues. – Marko Topolnik Jul 25 '15 at 09:21
  • I re-ran the benchmark this morning with 1000 iterations. The degradation in performance has disappeared. I think this case may be due to CPU overheating, however I am yet to verify this. – Mantis Jul 25 '15 at 11:55
  • I have already confirmed it is not due to overheating by reproducing it locally, as has @the8472. Furthermore, the effects of JVM flags (especially increasing the inlining budget) on the problem are quite conclusive. – Marko Topolnik Jul 25 '15 at 11:59
  • Apologies, my understanding of @the8472 's first answer below was that he suggested this **was** due to overheating. How have you re-produced this locally? I re-ran the exact same benchmark with the exact same settings on the exact same laptop and did **not** re-produce it; I'm a little confused. I'm not suggesting that the effects of JVM flags do not improve performance - far from it. Its just that I re-ran the same byte code (I did not recompile the class) and did not re-produce the issue. I think the degradation in performance in my particular case was due to overdriving the CPU. – Mantis Jul 25 '15 at 12:20
  • 1) the8472's comment below this answer states which flags he used to avoid the performance drop. This implies that he observed the drop as well. 2) I have reproduced by running JMH locally with your code. 3) The JVM flags were used to identify the problem, not to fine-tune the JVM. 4) Performance degradation due to CPU clock speed can never be as dramatic as what you observed. It would also probably not be as sudden and would change more often as the clock speed is readjusted. – Marko Topolnik Jul 25 '15 at 12:29
  • 1) @the8472's comment I'm reffering to is - "That's a mobile CPU as far as I can tell. Check if it's exhausting its thermal design envelope and throttling down the CPU clock to avoid overheating". Thats pretty clear cut what hes suggesting there, however you are forming your own opinions about another comment he made. 2) I cannot reproduce with same code on same laptop 3) Is as you said - "increasing the inlining budget" not tuning the JVM? 4) May well be true, however that's an assumption - I'm using scientific measurements and not seeing a drop in ops/s when CPU temp is stable. – Mantis Jul 25 '15 at 12:52
  • 1) the8472's answer makes no more than an educated guess; his comment under my answer implies quite obviously that he a) reproduced the issue and b) avoided it using the stated flags. 2) I can, however, and completely deterministically so. You may have inadvertently changed something which made the issue invisible to you. 3) You may use any term you like as long as you realize the consequences of that particular flag having that particular effect on measurements. 4) You scientific measurements currently point to no conclusion at all. You have no observation. – Marko Topolnik Jul 25 '15 at 13:01
  • I was not able to reproduce it initially because my local JMH setup which I re-used already had a bunch of flags configured which prevented the issue. Switching to default VM parameters allowed me to reproduce the problem. So @MarkoTopolnik diagnosis is reproducible. Of course that does not preclude the possibility that you also experienced CPU overheating. – the8472 Jul 25 '15 at 13:02
  • The "CPU overheating" hypothesis is very simple to dismiss: use `@Fork(5)` and watch the behavior reliably reproduced each time on a hot CPU. – Marko Topolnik Jul 25 '15 at 13:05
  • Agreed, no conclusion at all. As I said I am yet to verify that this was caused by CPU overheating. Yes, maybe I changed something inadvertently. However, I'm 99.9% sure I just re-ran the same byte code with the settings today as I did last night. – Mantis Jul 25 '15 at 13:09
  • Thanks for the input, changing fork(5) did indeed re-produce the issue. The CPU temp of one core reached 89 degrees this time. still looking into this. At least I can re-produce now. – Mantis Jul 25 '15 at 13:23
1

intel i7-4810MQ

That's a mobile CPU as far as I can tell. Check if it's exhausting its thermal design envelope and throttling down the CPU clock to avoid overheating.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • I tested on a laptop with a regular CPU, same behavior. – Marko Topolnik Jul 25 '15 at 08:21
  • I re-ran the benchmark this morning having left my laptop turned off overnight. The inconsistent behaviour has completely disappeared - even over 1000 iterations. I think @the8472 has nailed it. However, I am yet to re-create the issue and then check the CPU temperature to verify this. – Mantis Jul 25 '15 at 11:54