2

Recently I was told to look at how C functions are compiled into LLVM bytecode, and then how the LLVM bytecode is translated into x86 ASM. As a regular GNU/gcc user, I have some questions about this. To put it mildly.

Does GNU/gcc compile to bytecode, too? Can it? I was under the impression that gcc compiles directly into ASM. If not, is there a way to view the bytecode intermediary as there is with the clang command?

~$ clang ~/prog_name.c -S -emit-llvm -o - <== will show bytecode for prog_name.c.

Also, I find bytecode to be rather byzantine. By contrast, it makes assembly language seem like light reading. In other words: I have little idea what it is saying.

Does anyone have any advice or references for vaguely deciphering the information that the bytecode gives? Currently I compare and contrast with actual ASM, so to say it is slow going is a compliment.

Perhaps this is all comically naive, but I find it quite challenging to break through the surface of this.

d0rmLife
  • 4,112
  • 7
  • 24
  • 33
  • http://llvm.org/releases/1.3/docs/CommandGuide/html/llvmgcc.html – Robert Harvey Mar 18 '13 at 23:26
  • And for the docs on LLVM language itself: http://llvm.org/docs/LangRef.html – Pavel Minaev Mar 18 '13 at 23:27
  • @RobertHarvey that looks significantly more helpful than http://llvm.org/docs/LangRef.html ... thank you – d0rmLife Mar 18 '13 at 23:29
  • 2
    @RobertHarvey: note that you're linking to a *very* old release of LLVM. llvmgcc has long been deprecated. Its functionality is now implemented via the dragonegg gcc plugin. – Eli Bendersky Mar 18 '13 at 23:37
  • @EliBendersky: Thanks. I did notice that it was quite old. – Robert Harvey Mar 18 '13 at 23:38
  • Perhaps the source of your confusion is the term "C bytecode" - there is no such thing. The LLVM intermediate representation (called "LLVM bitcode" or "LLVM IR") is an [SSA](http://en.wikipedia.org/wiki/Static_single_assignment_form) programming language which is not related to C. GCC uses another intermediate representation (actually, multiple ones) which is not related to C either. – Oak Mar 19 '13 at 09:27
  • 1
    Try running `gcc -fdump-tree-ssa` – SK-logic Mar 20 '13 at 11:07
  • @SK-logic How to examine the dump? I checked the manual and ran `gcc -fdump-tree-ssa program.c -o program.ssa` but `nano program.ssa` looks like wingdings... or is that how it is supposed to look? – d0rmLife Mar 20 '13 at 16:44
  • @d0rmLife, you're looking at a binary file. You have to examine the file with a name like 'program.c.????.ssa', it should be generated in the same directory where your `program.c` is. – SK-logic Mar 20 '13 at 16:47
  • @SK-logic Nice. That looks better :P – d0rmLife Mar 20 '13 at 16:58

1 Answers1

3

Perhaps try taking a look at the language reference.

As far as I know, GCC does have an IR as well known as GIMPLE (another reference here).

If you mean that you would rather analyze the assembly output instead of the IR, you can take a look at this question which describes how to output an assembly file.

Community
  • 1
  • 1
Jorge Israel Peña
  • 36,800
  • 16
  • 93
  • 123
  • @d0rmLife Indeed I believe the gcc site is down temporarily, I linked them anyways because they are the links you'll want to see (aka official etc.) – Jorge Israel Peña Mar 18 '13 at 23:34