15

Context

I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2.

Issue

The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.

I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround?

Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.


e.g. given this input:

runtime.h

struct Foo {
  int a;
  int b;
};

struct Foo * something_with_foo(struct Foo *foo);

I need a bitcode file with this equivalent IR

runtime.ll

; ...etc...

%struct.Foo = type { i32, i32 }

declare %struct.Foo* @something_with_foo(%struct.Foo*)

; ...etc...

I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.


Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode

jayphelps
  • 15,276
  • 3
  • 41
  • 54

2 Answers2

6

Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again:

The AST file itself contains a serialized representation of Clang’s abstract syntax trees and supporting data structures, stored using the same compressed bitstream as LLVM’s bitcode file format.

The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. This is not very clear from the help page on the website, so I am just speculating. A LLVM/Clang expert could help clarify this point.

Unfortunately, there does not seem to be an elegant way around this. What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature.

Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll to get a module with all definitions in LLVM IR format.

An example from the snippet you posted:

struct Foo {
  int a;
  int b;
};

// Fake function definition.
struct Foo *  something_with_foo(struct Foo *foo)
{
    return NULL;
}

// A global variable.
struct Foo* x;

Output that shows that definitions are kept: https://godbolt.org/g/2F89BH

Banex
  • 2,890
  • 3
  • 28
  • 38
  • Thanks! I'm trying to avoid this as then I need to write code that removes the definitions (function bodies) from the LLVMModule, otherwise it complicates the linking process later with duplicate bodies. – jayphelps May 23 '18 at 23:40
  • @jayphelps You can use the `Function::deleteBody` function for that as a one-liner ([link](http://llvm.org/doxygen/classllvm_1_1Function.html#a0020cbf9c3df714558a9b20a6267bd29)). Just iterate on all functions on the precompiled module and call `deleteBody` on them. – Banex May 24 '18 at 09:41
  • Thanks Banex! I'm aware, I'd just really prefer not to hack around this and even though I was fairly confident there was not a non-hacky solution I thought I'd ask. If no better solution is proposed your question will be selected as the answer (and bounty winner) in 4 days. Thanks again! – jayphelps May 26 '18 at 03:31
5

So, clang doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.

You can look at these lines in the clang repo.

// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
  if (!FD->doesDeclarationForceExternallyVisibleDefinition())
    return;

The simple fix here would be to either comment the last two lines or just add && false to the second condition.

// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
  if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
    return;

This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc) files. Assuming that is not an issue.

To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue.

Ajay Brahmakshatriya
  • 8,993
  • 3
  • 26
  • 49
  • So wonderful! Of course it'd be ideal to not have to modify clang, but this is indeed the correct answer for a non-hacky solution and it's possible they'll accept a PR when I add it as a flag. Thanks much! – jayphelps May 29 '18 at 20:16
  • @jayphelps you can try submitting a PR but whether they will accept it depends on if they think this is an important feature. And I also doubt there is any other way to achieve this result without modifying `clang`, since these declarations are explicitly dropped here. – Ajay Brahmakshatriya May 30 '18 at 03:57