28
time javac Main.java                                      --> 0m1.050s
time javac Main.java & javac Main.java                    --> 0m1.808s
time javac Main.java & javac Main.java & javac Main.java  --> 0m2.690s
time javac Main.java & ... 8 time                         --> 0m8.309s

When we run javac command in parallel and with each increase in javac command ~1 sec gets added for all the javac command to complete.

Why is there a linear growth is time ?

Is all javac process while running involved in some kind on locks, if yes how to overcome it so as not to have a linear growth in time


PS: I have tried above on single core machine, double core machine, 4 core machine all showed same behaviour.

PS2: environment RedHat7, javac 1.7.0_79

Bhuvan
  • 4,028
  • 6
  • 42
  • 84
  • 3
    Could easily be I/O-bound, not CPU-bound. – T.J. Crowder Jun 06 '15 at 16:12
  • 1
    any idea on how to confirm if its io bound... does look like so since our data is very small – Bhuvan Jun 06 '15 at 16:13
  • Hmmm, `javac` wants files. I guess you could use a RAM disk. I'd also ensure that `Main.java` was **large** so you're really checking compilation vs. load/save. But of course, the real question is: What are you trying to optimize? Because if it's the entire process, I/O is an important factor you won't want to test around. – T.J. Crowder Jun 06 '15 at 16:16
  • At least in your tests above, the file "Main.java" could be locked by javac while it's being processed. I assume you've tested with different files? – markspace Jun 06 '15 at 16:22
  • I want to compile java files present in different directories separately... my Main.java is a hello world example...any idea on how to check if there are locks involved – Bhuvan Jun 06 '15 at 16:22
  • @markspace i have tried with different Main.java present in different directories as well...same bahaviour – Bhuvan Jun 06 '15 at 16:23
  • What flavor of linux? It's been a long while but can you verify that starting javac this way is efficient? Like does just reading the javac image from disc and starting execution take a full second? There may be ways to optimize that. – markspace Jun 06 '15 at 16:25
  • i am not bench-marking compile time.. i am bench-marking running multiple javac.. the question is why 1 sec increment per addition of javac.. – Bhuvan Jun 06 '15 at 16:28
  • Tested above on RHEL7, RHEL6 and windows7 – Bhuvan Jun 06 '15 at 16:30
  • Here's a [Stack Exchange question](http://unix.stackexchange.com/questions/169326/calling-multiple-bash-scripts-and-running-them-in-parallel-not-in-sequence) that gets into GNU `parallel`, you might try that. I think I would do the opposite as you. Take a large project, time it for one javac. Then take the number of files roughly in half, and use two javac instances. Keep subdividing until you reach a point of diminishing returns. I think a single file is too small for efficient compilation. – markspace Jun 06 '15 at 16:37
  • You should certainly try compiling renamed variants of the source Maina, Mainb, etc. Its certainly possible the destination file or class dir is locked. – Gene Jul 19 '15 at 14:09
  • @Gene already tried -- no change in bahaviour – Bhuvan Jul 19 '15 at 14:28
  • 1
    You didn't mention your compiler and build system. See http://blog.jetbrains.com/idea/2012/12/intellij-idea-12-compiler-twice-as-fast/ – Gene Jul 19 '15 at 16:35
  • @Gene no specific build system...plain old javac command – Bhuvan Jul 19 '15 at 16:41
  • There's more than one javac. Oracle? The other thing you should try is compiling _packages_ separately (i.e. put each source in a separate package). Locks could be at the package level. – Gene Jul 19 '15 at 16:50

1 Answers1

28

The java compiler already handles dividing its work across available processors, even when only compiling a single file. Therefore running separate compiler instances in parallel yourself won't yield the performance gains you are expecting.

To demonstrate this, I generated a large (1 million lines, 10,000 methods) java program in a single file called Main1.java. Then made additional copies as Main2.java through Main8.java. Compile times are as follows:

Single file compile:

time javac Main1.java &    --> (real) 11.6 sec

Watching this single file compile in top revealed processor usage mostly in the 200-400% range (indicating multiple CPU usage, 100% per CPU), with occasional spikes in the 700% range (the max on this machine is 800% since there are 8 processors).

Next, two files simultaneously:

time javac Main1.java &    --> (real) 14.5 sec
time javac Main2.java &    --> (real) 14.8 sec

So it only took 14.8 seconds to compile two, when it took 11.6 seconds to compile one. That's definitely non-linear. It was clear by looking at top while these were running that again each java compiler was only taking advantage of at most four CPUs at once (with occasional spikes higher). Because of this, the two compilers ran across eight CPUs mostly in parallel with each other.

Next, four files simultaneously:

time javac Main1.java &    --> (real) 24.2 sec
time javac Main2.java &    --> (real) 24.6 sec
time javac Main3.java &    --> (real) 25.0 sec
time javac Main4.java &    --> (real) 25.0 sec

Okay, here we've hit the wall. We can no longer out-parallelize the compiler. Four files took 25 seconds when two took 14.8. There's a little optimization there but it's mostly a linear time increase.

Finally, eight simultaneously:

time javac Main1.java &    --> (real) 51.9 sec
time javac Main2.java &    --> (real) 52.3 sec
time javac Main3.java &    --> (real) 52.5 sec
time javac Main4.java &    --> (real) 53.0 sec
time javac Main5.java &    --> (real) 53.4 sec
time javac Main6.java &    --> (real) 53.5 sec
time javac Main7.java &    --> (real) 53.6 sec
time javac Main8.java &    --> (real) 54.6 sec

This was actually a little worse than linear, as eight took 54.6 seconds while four only took 25.0.

So I think the takeaway from all this is to have faith that the compiler will do a decent job trying to optimize the work you give it across the available CPU resources, and that trying to add additional parallelization by hand will have limited (if any) benefit.

Edit:

For reference, there are two entries I found in Oracle's bug database regarding enhancing javac to take advantage of multiple processors:

  • Bug ID: JDK-6629150 -- The original complaint, this was eventually marked as a duplicate of:
  • Bug ID: JDK-6713663 -- Suggests the resolution, and based on the "Resolved Date" it appears that multi-processor support in javac was added on 2008-06-12.
Kevin Hoffman
  • 616
  • 6
  • 8
  • 1
    nice explanation, but if we take 8 hello world java (as opposite to your) file compile then parallely using 8 javac on a 8 core machine then it should have completed in 1 sec.... right ? – Bhuvan Jul 23 '15 at 14:50
  • @user2410148: using a small "Hello world" file I got the following compile times: 1 took 0.32 sec; 2 took 0.39 sec; 4 took 0.54 sec; 8 took 1.02 sec. It seems to follow the same pattern, where breaking it into two compiles parallelized okay, but then trying to do four or eight was a more linear increase in time. I think the same concept applies on the small scale as well: let the compiler do the parallelization for you. – Kevin Hoffman Jul 23 '15 at 15:07
  • I don't think your conclusion is sound. To prove your point shouldn't you be timing how long it takes to compile 1-8 in series? Otherwise you are comparing two different amounts of work. By my estimate of your numbers I can kick off javac and wait 88 seconds or I can parallelize it from the command line and wait 55 seconds. Most of my java files are in the 500 line range, I'm guessing javac will have a harder time making use of multiple cores in smaller files/classes. I'm not saying that 16000 parallel javac is the ideal solution but I disagree with the advice to just have faith. – Ryan Jul 24 '15 at 15:31
  • 2
    @Ryan: My analysis was meant to demonstrate that javac is indeed multi-threaded and does a decent job at spreading work across the available processors. This doesn't mean that you can't get some better performance in certain cases by running multiple instances in parallel, but you shouldn't expect time/N type improvements as the asker expected. FWIW, I ran the eight compiles in series and it took 89 seconds. Then tried a single `javac *.java` which took 56 secs, which is very close to the 54 secs running them in parallel. Also see my previous comment - smaller files yield similar results. – Kevin Hoffman Jul 24 '15 at 16:55
  • `Then tried a single javac *.java which took 56 secs ` This helps prove your point. – Ryan Jul 28 '15 at 17:06
  • Can you post and link to the `Main.java` you generated? –  Aug 11 '16 at 23:03