I encountered a performance problem that puzzles me. For my studies, I work on a diffusion limited aggregation simulation (basically a fancy random walker). To speed things up I parallelized my code using parallel streams map so that multiple walkers are spawned and walk independently until they hit something and then return their position. The performance scaled quite good on my laptop using 1-7 threads.
Now I wanted to do bigger simulations. So I got my self a bigger machine. The result was a massive performance decrease. I compared both systems and my laptop with an Intel I7-4712HQ (8 threads, Geekbench 12k) was three times faster than my server with 4x Intel E7-4870 (80 threads, Geekbench 35k).
I checked the load using htop during the runtime and the laptop showed an average of 8 versus 70 on the server, so the cores are utilized and not idling.
Can that actually be right? Both machines are running Ubuntu and Oracle java 8. It would be greatly appreciated if someone has a suggestion where to look for a mistake.
Bests
ps. I can post the code if needed or provide more details