Why is multithreaded Java Program not faster on 'super' Linux server vs laptop Win7?

Question

Intro

So far, I have been working on a piece of software which I am now testing to see the benefit of concurrency. I am testing the same software using two different systems:

System 1: 2 x Intel(R) Xeon(R) CPU E5-2665 @ 2.40GHz with a total of 16 cores , 64GB of RAM running on Scientific LINUX 6.1 and JAVA SE Runtime Enviroment (build 1.7.0_11-b21).
System 2 Lenovo Thinkpad T410 with Intel i5 processor @ 2.67GHz with 4 cores, 4GB of ram running windows 7 64-bit and JAVA SE Runtime Enviroment (build 1.7.0_11-b21).

Details: The program simulates patients with type 1 diabetes. It does some import (read from csv), some numerical computations(Dopri54 + newton) and some export (Write to csv).

I have exclusive rights to the server, so there should be no noise at all.

Results These are my results:

Now as you can see system 1 is just as fast as system 2 despite it is a pretty powerfull machine. I have no idea why this is the case - and I am confident that the system is the same. The number of threads goes from 10-100.

Question:

Why would does the two runs have similar execution time despite system 1 being significantly more powerfull than system 2?

UPDATE!

Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem?

Its a program simulating patients with type 1 diabetes. It does some import (read from csv), some numerical computations(Dopri54) and some export (Write to csv). — SteewDK, Feb 19 '14 at 16:48
*Very hand-waving argument* My guess is that, because Java floats above the hardware(on the JVM ), that the bottlenock is actually the Java Virtual machine,,, and it doesn't matter how fast your hardware is. At this point, the Java Virtual machine dictates everything — Caffeinated, Feb 19 '14 at 16:51
@SotiriosDelimanolis I ammended it to "significantly more powerfull". — SteewDK, Feb 19 '14 at 16:51
@SteewDK Perhaps your code doesn't really exploit multithreading enough, e.g. you have synchronization points that serializes the CPU intensive parts, or it's I/O bound. — nos, Feb 19 '14 at 16:52
For computation intensive tasks you should have as many threads as there are cores. To avoid the I/O bottleneck, you can dedicate separate threads for reading and writing data. — Tarik, Feb 19 '14 at 16:58
@nos thx for your comment - I will try an see if removing the final csv output may help. — SteewDK, Feb 19 '14 at 17:09
@Tarik that was a well thought idea. But actually as it is now each thread does its own import, computation and export (independent of each other). — SteewDK, Feb 19 '14 at 17:09
Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem? — SteewDK, Feb 19 '14 at 23:18

score 3 · Answer 1 · answered Feb 19 '14 at 16:51

3

As you write .csv files, it is possible that the bottleneck is not your camputation power, but the writing rate on your hard disk.

answered Feb 19 '14 at 16:51

exception1

1,239
8
17

Well I did think about that - but 64GB of ram should do the trick? – SteewDK Feb 19 '14 at 16:54
1

RAM has very little to do with disk IO throughput, see my answer for a more detailed response anyway. – Tim B Feb 19 '14 at 16:56
RAM actually does noz influence how fast the files are read/written on the disk. Especially, if the amount of files is relatively big (let's say 2 files per patient) compared to the numerical conputation, additional RAM ond/or processors won't help. FileIO is much, much slower than numerical/logical instructions. – exception1 Feb 19 '14 at 17:00
Actually as it is now each thread does its own import, computation and export (independent of each other). In addition I can choose to store the final export to memory instead (export to concurrentHashMap). Do you have other suggestions for optimization? – SteewDK Feb 19 '14 at 17:10
Your Threads can compute their stuff parallel, but they can't write parallel. You can use `ThreadPoolExecutor` class, so you will have less overhead regarding creating Thread objects. I think the best you can do, as you stated, is to store all information in a HashMap and to write the content in one file, after all the data was computed. – exception1 Feb 19 '14 at 17:24
1

Running the threads in parallel for disk access could well slow things down as it will cause the drives to have to hop around all the time. – Tim B Feb 19 '14 at 18:42
Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem? – SteewDK Feb 20 '14 at 08:44

score 1 · Answer 2 · answered Feb 19 '14 at 16:52

1

Almost certainly this means that either CPU time is not the bottleneck for this application, or that something about it is making it resistant to effective parallelization, or both.

For example if reading the data from disk is actually the limiting factor then faster disks are what matters, not faster processors.

If it's running out of memory then that will be a bigger bottlneck.

If it takes more time to spawn each thread than the actual processing inside the thread.

etc.

In this sort of optimization work metrics are king. You need real hard solid numbers for how long things are taking, and where in your program you are losing that time. Only then can you see where to focus your efforts and see if they are effective.

answered Feb 19 '14 at 16:52

Tim B

40,716
16
83
128

The IO/Controller might be an additional bottleneck, it doesn't necessarily have to be the disk itself; I've seen that first-hand on a development machine. But it might just as easily be the configuration of the Java runtime that is off. A dump of the runtime configuration settings would be in order to rule that out. – Gimby Feb 19 '14 at 16:58
@Gimby how do I figure out the runtime config? What I do in linux before executing is clearing the java opt file by `unset JAVA_TOOL_OPTIONS` – SteewDK Feb 19 '14 at 17:05

Why is multithreaded Java Program not faster on 'super' Linux server vs laptop Win7?

2 Answers2