77

Is there any way or tool to apply GPU acceleration on compiling programs with GCC compiler? Right now I have created a program to compile the given list of programs iteratively. It takes a few minutes. I know of a few programs like Pyrit which helps to apply GPU acceleration for precomputing hashes.

If there are no such tools available, Please advice on whether to use OpenCL or anything else to reprogram my code.

tshepang
  • 12,111
  • 21
  • 91
  • 136
clu3Less
  • 1,812
  • 3
  • 17
  • 20
  • 1
    Rather unclear, at least to me. Are you looking for a compiler that automatically "GPU-accelerates" your code, or a GPU-accelerated compiler? – unwind Dec 07 '11 at 14:40
  • 30
    I'm pretty sure he means a GPU-accelerated compiler. – RCE Dec 07 '11 at 14:44
  • I rather doubt that compilation will benefit from running on a GPU. Could you be more specific about (1) what you are trying to accomplish and (2) what you've done to identify the bottlenecks in your existing process. – dmckee --- ex-moderator kitten Dec 07 '11 at 18:52
  • Im trying to implement something similar to.. [ACOVEA](http://en.gentoo-wiki.com/wiki/Acovea). But ACOVEA is very slow. I was just wondering if there is a way to accelerate this program performance with GPU acceleration. Im sry if I'm blabbering blunders. I dont know much about GPU acceleration. – clu3Less Dec 07 '11 at 20:55
  • In this case it is not ACOVEA that is slow, but the individual builds. That's not surprising, a lot of builds are inefficient and a lot of ink has been spilled about how that might be improved, but none of that is in the control of ACOVEA nor will it be in *your* control. I think you're just out of luck. What this process *could* benefit from is parallelizing the individual builds across many cores (or better separate machines with their own IO infrastructure). Still, the tests have to be run locally no matter what. – dmckee --- ex-moderator kitten Dec 08 '11 at 15:12
  • Thnx @dmckee. One Last question. So it not possible to re-program this in CUDA or OpenCL to improve performance? – clu3Less Dec 09 '11 at 00:49

2 Answers2

43

A. In an imperative programming language, statements are executed in sequence, and each statement may change the program's state. So analyzing translation units is inherently sequential.

An example: Check out how constant propagation might work -

a = 5;
b = a + 7;
c = a + b + 9;

You need to go through those statements sequentially before you figure out that the values assigned to b and c are constants at compile time.

(However separate basic blocks may possibly be compiled and optimized in parallel with each other.)

B. On top of this, different passes need to execute sequentially as well, and affect each other.

An example: Based on a schedule of instructions, you allocate registers, then you find that you need to spill a register to memory, so you need to generate new instructions. This changes the schedule again.

So you can't execute 'passes' like 'register allocation' and 'scheduling' in parallel either (actually, I think there are articles where computer scientists/mathematicians have tried to solve these two problems together, but lets not go into that).

(Again, one can achieve some parallelism by pipelining passes.)

Moreover, GPUs especially don't fit because:

  1. GPUs are good at floating point math. Something compilers don't need or use much (except when optimizing floating point arithmetic in the program)

  2. GPUs are good at SIMD. i.e. performing the same operation on multiple inputs. This is again, not something a compiler needs to do. There may be a benefit if the compiler needs to, say, optimize several hundred floating point operations away (A wild example would be: the programmer defined several large FP arrays, assigned constants to them, and then wrote code to operate on these. A very badly written program indeed.)

So apart from parallelizing compilation of basic blocks and pipelining passes, there is not much parallelism to be had at the level of 'within compilation of a C file'. But parallelism is possible, easy to implement, and constantly used at a higher level. GNU Make, for example, has the -j=N argument. Which basically means: As long as it finds N independent jobs (usually, compiling a bunch of files is what GNU Make is used for anyway), it spawns N processes (or N instances of gcc compiling different files in parallel).

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
  • 7
    You seem to suggest that compilation cannot be parallelized; projects like [distcc](http://en.wikipedia.org/wiki/Distcc) seem to be clear evidence to the contrary. Parallizing compilation is standard practice these days. Regardless, I don't doubt that GPU acceleration of compilation may remain infeasible -- there are many limiting factors like bus throughput, etc. – Frank Farmer Feb 05 '13 at 18:50
  • 9
    @FrankFarmer - I am not sure if you read my entire answer. I haven't used `distcc`, but a glance at [*this*](http://distcc.googlecode.com/svn/trunk/doc/web/index.html) shows me that `distcc` runs at the *file* granularity. It preprocesses stuff on the host machine, and sends out entire files to hosts to process. i.e. each file by itself is still compiled serially. This is pretty much the same *level of granularity* as GNU Make does with the `-j` option. – ArjunShankar Feb 06 '13 at 10:20
  • @FrankFarmer - If you look at the last paragraph in the answer, you'll see where `distcc` fits in. – ArjunShankar Feb 06 '13 at 10:23
  • Are all passes in compiler considered sequential in nature? How about localized IL reductions? Also if user code uses floating point, so does the compiler. Anything that compiler can resolve at compile time, it will do so during compilation. That means const floating point, vector, SIMD math can potentially benefit from exploiting parallelism – kchoi Dec 19 '14 at 20:59
  • @kchoi I do not understand what you mean by *localized IL reductions*. That said, I understand that some passes operating on independent blocks of code can be parallelised. Pass execution may itself be pipelined. And of course compilation of separate translation units can be parallelized. But this sort of parallelisation isn't something that fits a GPU. As to SIMD operations: I don't expect that a compiler has the opportunity to do hundreds of floating point operations at the same time. – ArjunShankar Dec 19 '14 at 21:39
  • 1
    I think that for now, I am satisfied with the answer I have written and believe it correctly explains why a GPU isn't suited for accelerating compilers. If you disagree, please feel free to downvote this answer, and/or write your own. I'd be glad to read and learn from it. I could be wrong and I am certainly not a compiler expert. – ArjunShankar Dec 19 '14 at 21:42
  • 1
    @ArjunShankar, "some passes operating on independent blocks of code can be parallelised" is what I meant. You're correct in that most of the compiler logic is sequential in nature. To add to ones you've already listed, most dataflow based optimizations are sequential. There is some benefit to parallelizing actual compilation itself, for JIT compilers. Compiler may also not deal frequently with const float/vector ops, but who knows what user writes. You could argue that user could improve his program, but there may be more to gain from doing this. – kchoi Dec 19 '14 at 21:55
  • 1
    This was useful critique. Thanks. I have edited the answer to try and talk about these. – ArjunShankar Dec 19 '14 at 22:17
25

IF what you are asking is, "Can you automatically write GPU-accelerated code for use with GCC and LLVM?" the answer is yes. NVIDIA and Google make open-source LLVM-based compiler projects:

NVIDIA CUDA LLVM:

GOOGLE GPUCC:

If your question is, "can I use the GPU to speed up non-CUDA generic code compilation?" the answer is currently no. The GPU is good at certain things like parallel tasks, bad at others like branches which compilers are all about. The good news is, you can use a network of PCs with CPUs to get 2-10x compile speedups, depending on how optimized your code is already, and you can get the fastest multi-core CPU and high-speed SSD available for your desktop to get gains for less hassle before you resort to network builds.

There are tools to distribute C/C++/ObjC compiler tasks to a network of computers like Distcc. It was included in older versions of XCode but has been removed, and there's no support for using it with Swift.

There is a commercial tool similar to Distcc called Incredibuild that supports Visual Studio C/C++ and Linux development environments:

There are some good articles about real-world use of Incredibuild vs Distcc and tradeoffs compared to the incremental build support in the native compiler for making small changes like a single line in a single file without recompiling everything else. Points to consider:

  • You can speed up a code base significantly by pre-compiling headers, using multiple DLLs, and using incremental builds on a single machine.
  • Incredibuild is a more complete solution for automatically distributing work and guaranteeing the same result as a serial compile compared to doing it for free with distcc where you have to do a lot more work for the same results and compatibility with anything other than gcc.
  • For a detailed review, see http://gamesfromwithin.com/how-incredible-is-incredibuild
Alex Peake
  • 361
  • 3
  • 2