Compiling with g++ using multiple cores

Question

Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time for a multi-core CPU)?

Will it really help? All my compile jobs are I/O bound rather than CPU bound. — Brian Knoblauch, Jan 06 '09 at 13:28
Even if they are I/O bound you can probably keep the I/O load higher when the CPU heavy bits are happening (with just one g++ instance there will be lulls) and possibly gain I/O efficiencies if the scheduler has more choice about what to read from disk next. My experience has been that judicious use of `make -j` almost always results in some improvement. — Flexo, Aug 22 '11 at 09:35
@BrianKnoblauch But on my machine(real one or in VirtualBox), it's CPU bound, I found that the CPU is busy through 'top' command when compiling. — superK, Jul 19 '13 at 09:07
Even if they are I/O bound, we can use gcc's flag '-pipe' to reduce pain. — superK, Jul 19 '13 at 09:09
just saw this in google: https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode_using.html#parallel_mode.using.prereq_flags — Jim Michaels, Jun 25 '14 at 00:30
@JimMichaels That's completely unrelated: it's about parallel processing as part of the program at runtime, i.e. _after_ compilation. The question is about compiling in parallel using multiple jobs. — underscore_d, Jul 05 '16 at 18:35
https://david-smith.org/blog/2011/07/27/is-compilation-cpu-bound/ this fellow thinks cpu bound — tofutim, Mar 15 '18 at 21:40

frankodwyer · Accepted Answer · 2009-01-05T22:32:13.707

264

You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).

For example if you want 4 parallel jobs from make:

make -j 4

You can also run gcc in a pipe with

gcc -pipe

This will pipeline the compile stages, which will also help keep the cores busy.

If you have additional machines available too, you might check out distcc, which will farm compiles out to those as well.

edited Jan 05 '09 at 22:32

answered Jan 05 '09 at 22:26

frankodwyer

13,948
9
50
70

47

You're -j number should be 1.5x the number of cores you have. – Mark Beckwith Jan 06 '09 at 17:47
yes, something like that makes sense given there is I/O as well - although may need some tuning if using -pipe as well – frankodwyer Jan 06 '09 at 21:12
2

Thanks. I kept trying to pass "-j#" to gcc via CFLAGS/CPPFLAGS/CXXFLAGS. I had completely forgotten that "-j#" was a parameter for GNU make (and not for GCC). – chriv Sep 30 '12 at 03:24
46

Why does the *-j* option for GNU Make needs to be 1.5 x the number of CPU cores? – Alex Bitek Oct 12 '12 at 07:43
37

The *1.5* number is because of the noted *I/O bound* problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as *2x*. See also: [Gnu make `-j` arguments](http://stackoverflow.com/questions/2499070/gnu-make-should-j-equal-number-the-number-of-cpu-cores-in-a-system) – artless noise Jul 31 '13 at 20:39
make -j is broken, with mingw-w64 it will cause a long list of compilation errors whereas without project will compile fine. don't recommend it. I recommend submitting a bug report to the gnu make folk. – Jim Michaels Jun 25 '14 at 00:19
Will the Raspberry Pi 2 benefit from this flag? Will it be able to compile faster thanks to the new 4 core processor? – Piotr Kula Feb 17 '15 at 21:53
4

@JimMichaels It could be because dependencies are badly set within your project, (a target starts building even if its dependencies are not ready yet) so that only a sequential build ends up being successful. – Antonio May 28 '15 at 14:36
-pipe is not pipeline, its to use pipes instead of temporary files. Uses more memory, but in some cases is faster. If your project is large, it might be worth trying. – Nick Feb 25 '17 at 03:23
2

Ok so this is quite a while after the original discussion, BUT: compiling Emacs git master on an AMD Threadripper 1950x with 16 cores using Fedora 27 completely breaks this `1.5x` rule of thumb. There are 32 threads available, and 16 physical cores, yet `make -j 12` is roughly fastest with approximately `1m15` to `1m20` user time. Increasing the arg to `-j` only increases build times. So in effect it's more like `3/4 x` or `3/8 x` depending on whether you want to count SMT "cores" or not. Of course it's entirely possible here that TR just rips through the parallellizable parts quickly... – jjpe Dec 19 '17 at 22:18
How about `make test`? – BarzanHayati Feb 06 '23 at 10:25

score 45 · Answer 2 · edited May 10 '16 at 20:07

45

There is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la

make -j4

Or you can use pmake or similar parallel make systems.

edited May 10 '16 at 20:07

Lightness Races in Orbit

378,754
76
643
1,055

answered Jan 05 '09 at 22:25

Mihai Limbășan

64,368
4
48
59

http://www.gnu.org/software/make/manual/html_node/Parallel.html also http://www.gnu.org/software/make/manual/html_node/Options-Summary.html#Options-Summary – Jim Michaels Jun 25 '14 at 00:25
4

_"Unix pedantry is not helpful"_ Good thing it wasn't pedantry then, anonymous editor. Rolled back. Reviewers please pay more attention to what you're doing. – Lightness Races in Orbit May 10 '16 at 20:08
1

despite the claim of non-pedantry, gcc is getting a flag -fparallel-jobs=N Better tell the GCC devs they're doing it wrong. – Spike0xff Jun 25 '21 at 17:51

Havok · Answer 3 · 2018-05-29T23:07:11.540

If using make, issue with -j. From man make:

  -j [jobs], --jobs[=jobs]
       Specifies the number of jobs (commands) to run simultaneously.  
       If there is more than one -j option, the last one is effective.
       If the -j option is given without an argument, make will not limit the
       number of jobs that can run simultaneously.

And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count

Like this:

make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')

If you're asking why 1.5 I'll quote user artless-noise in a comment above:

The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.

Most Linux users will likely prefer the shorter: ``make -j`nproc` `` with `nproc` in GNU Coreutils. — Ciro Santilli OurBigBook.com, Dec 25 '18 at 10:36
If you're using an SSD, I/O isn't going to be as much of an issue. Just to build on Ciro's comment above, you can do this: ```make -j $(( $(nproc) + 1 ))``` (make sure you put spaces where I have them). — Ed K, Nov 08 '19 at 17:12
Nice suggestion using python, on systems where `nproc` isn't available, e.g. in `manylinux1` containers, it saves additional time by avoiding running `yum update`/`yum install`. — hoefling, May 01 '20 at 21:11

score 12 · Answer 4 · answered Jan 06 '09 at 11:27

People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.

We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.

score 11 · Answer 5 · answered Jan 05 '09 at 22:24

11

make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.

answered Jan 05 '09 at 22:24

rmeador

25,504
18
62
103

1

+1 for mentioning `-l` option ( does not start a new job unless all previous jobs did terminate ). Otherwise it seems that the linker job begins with not all object files built (as some compilations are still ongoing), so that the linker job fails. – NGI Aug 14 '18 at 14:27

score 7 · Answer 6 · answered Aug 21 '11 at 15:58

7

distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.

answered Aug 21 '11 at 15:58

Jason

71
1
1

+1, distcc is a useful tool to have in one's arsenal for large builds. – Flexo Aug 22 '11 at 09:37
Looks like there are a few that work "like" distcc as well: http://stackoverflow.com/questions/5374106/distributed-make/13770116#13770116 – rogerdpack Sep 04 '13 at 23:18

score 5 · Answer 7 · answered Jan 05 '09 at 22:25

5

I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).

answered Jan 05 '09 at 22:25

Andy

871
5
6

2

no N ist not the number of threads! Many people misunderstand that, but `-j N` tells make how many processes at once should be spawned, not threads. That's the reason why it is not as performant as MS `cl -MT`(really multithreaded). – Sebi2020 Jun 04 '15 at 11:45
what happens if `N` is too large? E.g. can `-j 100` break the system or is `N` merely an upper bound that is not required to achieve? – mercury0114 Dec 08 '20 at 15:42

Ciro Santilli OurBigBook.com · Answer 8 · 2018-12-25T13:58:05.440

GNU parallel

I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:

sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"

Explanation:

{.} takes the input argument and removes its extension
-t prints out the commands being run to give us an idea of progress
--will-cite removes the request to cite the software if you publish results using it...

parallel is so convenient that I could even do a timestamp check myself:

ls | grep -E '\.c$' | parallel -t --will-cite "\
  if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
    gcc -c -o '{.}.o' '{}'
  fi
"

xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs

Parallel linking was asked at: Can gcc use multiple cores when linking?

TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.

Tested in Ubuntu 18.10.

score 2 · Answer 9 · answered Apr 13 '23 at 21:23

You can use make -j$(nproc) . This command is used to build a project using the make build system with multiple jobs running in parallel.

For example, if your system has 4 CPU cores, running make -j$(nproc) would instruct make to run 4 jobs concurrently, one on each CPU core, speeding up the build process.

You can also see how many cores you have with run this command; echo $(nproc)

Compiling with g++ using multiple cores

9 Answers9

Linked

Related