2

My goal is to generate call graphs using CMake + Clang + GraphViz at build time.

Using these [1, 2] processes I can create simple graphs. But, I'm not sure how to generalise the process to a CMake project.

I have an executable target.

add_executable(${TARGET} ${SOURCES})

Which from within a macro, I add the graph relevant options to the target:

target_compile_options(${TARGET} PRIVATE -S -emit-llvm)

And, add an addtional post build command which generates the call graphs:

add_custom_command(
    TARGET ${TARGET}
    POST_BUILD
    COMMENT "Running clang OPT"
    COMMAND opt -analyze -dot-callgraph
)

But the clang attempts to create an executable for the target. This results in this error:

[build] lld-link: error: 
Container.test.cpp.obj: unknown file type

I also don't understand how any custom command (opt for example) would access the produced LLVM representation. It doesn't look like my custom command has any knowledge of the relevant files (even if the above error was fixed).


What I understand so far:

  1. CMake add_executable adds the -o outfile.exe argument to clang, this prevents me from doing the same steps shown in the linked processes [1, 2]
  2. $<TARGET_FILE:${TARGET}> can be used to find the produced files from clang, but I don't know if this works for LLVM representation.
  3. I've tried doing a custom target instead, but had issues getting all the TARGET sources with all the settings into the custom target.
  4. The process outlined here [3] might be relevant specially -Wl,-save-temps but this seems to be a pretty roundabout way to get IR (using llvm-dis).
  5. The unknown file type error is due to the object actually being LLVM representation, but I suspect the linker expects a different format.
  6. To get the linker to understand LLVM representation, add -flto to the linker options target_link_options(${TARGET} PRIVATE -flto), (source [4]). This is awesome, because it means I've almost solved this... I just don't know how to get the path to the produced bitcode output files in cmake, once I do, I can pass them to opt (I hope...).
  7. To get the target objects the following cmake command can be used $<TARGET_OBJECTS:${TARGET}> in the case of cmake this will list the .o (Is the .o because of a rename by cmake?) LLVM bitcode files.
  8. The .o file in this case is bitcode, however the opt tool appears to only a llvm representation. To convert to this llvm-dis bitcode.bc –o llvm_asm.ll. Due to cross compilation I believe the mangled symbol are of a strange format. Passing them into llvm-cxxfilt does not succeed, for example llvm-cxxfilt --no-strip-underscore --types ?streamReconstructedExpression@?$BinaryExpr@AEBV?$reverse_iterator@PEBD@std@@AEBV12@@Catch@@EEBAXAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z
  9. So addressing 8. this is a MSVC name mangling format. This indicates that when compiling on windows clang uses the MSVC format name mangling. A surprise to me... (source [5]).
  10. LLVM ships with llvm-undname it is able to demangle the symbols. This tool when I run it errors significantly when I give it raw input, it seems to only work with correct symbols. The tool demumble appears to be a cross platform, multi-format wrapper of llvm-undname and llvm-cxxfilt.

11.My almost working cmake macro is as follows:

macro (add_clang_callgraph TARGET)
    if(CALLGRAPH)
        target_compile_options(${TARGET} PRIVATE -emit-llvm)
        target_link_options(${TARGET} PRIVATE -flto)
        
        foreach (FILE $<TARGET_OBJECTS:${TARGET}>)
            add_custom_command(
                TARGET ${TARGET}
                POST_BUILD
                COMMAND llvm-dis ${FILE}
                COMMAND opt -dot-callgraph ${FILE}.ll
                COMMAND demumble ${FILE}.ll.callgraph.dot > ${FILE}.dot
            )
        endforeach()
    endif()
endmacro()

However, this doesn't work... The contents of ${FILE} is always the entire list...

This is still the case here:

foreach (FILE IN LISTS $<TARGET_OBJECTS:${TARGET}>)
    add_custom_command(
        TARGET ${TARGET}
        POST_BUILD
        COMMAND echo ${FILE}
    )
endforeach()

The result looks like:

thinga.obj;thingb.obj

This is because CMake doesn't evaluate the generator expression until AFTER the for loop is evaluated. Meaning, there is only one loop here and it contains the generator expression (not a resolved generator expression) (source [6]). This means I cannot loop through object files and create a series of custom commands for each object file.


I'll add to the above as I find things out, If I figure out the whole process I'll post a solution.

Any help would be greatly appreciated, this has been a great pain in the arse.


What I'm hoping for, a way to make CMake accept building an executable to a single LLVM representation file, using that file with opt to get the callgraph and then finishing the compilation with llc. I'm a little constrained though, as I'm cross compiling. Ultimately anything equivlient will do...

David Ledger
  • 2,033
  • 1
  • 12
  • 27
  • 1
    You may want to have a look at [this](https://github.com/compor/llvm-ir-cmake-utils) project. Regarding the above and talking from experience, I'd say maybe use something like https://github.com/SRI-CSL/gllvm or LTO to get the whole bitcode, but since you're cross-compiling you might be indeed restricted. In theory, they should work till the bitcode extraction. – compor Nov 24 '20 at 09:15
  • In theory, they should work till the bitcode extraction. Can you not combine the LTO .o files (essentially bitcode) with a custom post-build rule using `$`? Also, aren't you suppose to stop the compilation with `-c`? Lasltly, `opt` accepts both .bc and .ll format files. I'm not sure why you care suddenly about name mangling from point 8 onwards or what has it do with all the rest. Am I missing something? – compor Nov 24 '20 at 09:22
  • Nah I actually want it to compile too, this is compile and graph (I hope...) :) – David Ledger Nov 24 '20 at 10:30
  • But regarding the name mangling, its part of the graphvis goal, to have a readable graph. – David Ledger Nov 24 '20 at 10:34
  • 1
    Hmm, well the demangling can happen anytime after, I don't see why to put in it with CMake or the compilation pipeline. `-emit-llvm` does not work for linking to object code, the error Clang emits is `error: -emit-llvm cannot be used when linking`. This is without the `-c` or `-S` flags. Up to this point, the fact that you're cross-compiling should not be relevant. I'm the author of [this](https://github.com/compor/llvm-ir-cmake-utils) BTW, and I use LTO or gllvm whenever I can get away with it. However you get the bitcode files, you can collate them in a single bitcode with `llvm-link`. – compor Nov 24 '20 at 10:49
  • `llvm-link` is a good idea thanks :) – David Ledger Nov 24 '20 at 11:55
  • `-emit-llvm` seems to be able to be used with the compiler if `-flto` is specified in the linker. It compiles atleast, perhaps it shouldn't? – David Ledger Nov 24 '20 at 12:02
  • Yeah, LTO is a different use case. If you provide a minimal `CMakeLists.txt` it'd be useful. – compor Nov 24 '20 at 12:42
  • I added my macro :) It *should* work with other projects... And by work, I mean fail the same... – David Ledger Nov 24 '20 at 12:58

1 Answers1

1

I'll attempt an answer just to gather all my comment responses so far.

If you want to "subvert" CMake, it can be done with something like this (adapted from here out of OP's point 4 above):

cmake_minimum_required(VERSION 3.0.2)

project(hello)

set(CMAKE_C_COMPILER clang)
set(CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS} "-flto")

add_executable(hello main.c hello.c)

# decide your bitcode generation method here
# target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -emit-llvm)
target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -c -flto)

# this is just to print
add_custom_target(print_hello_objs 
  COMMAND ${CMAKE_COMMAND} -E echo $<JOIN:$<TARGET_OBJECTS:hello>," ">)

# this does some linking
# fill in details here as you need them (e.g., name, location, etc.)
add_custom_target(link_hello_objs 
  COMMAND llvm-link -o foo.bc $<TARGET_OBJECTS:hello> 
  COMMAND_EXPAND_LISTS)

For uses where processing on each file is required, the COMMAND can be an external script (bash/python) that just takes that list and generates the .dot files. The problem with generator expressions is that they are not evaluated till generation time in CMake and not in a foreach context.

If you want to trigger regeneration based on what object/bitcode file is recompiled, things get tricky since CMake has preset ways to invoke the components of a toolchain (compiler, link, etc.), hence why I wrote my CMake-based project back then, but I'd strongly recommend avoiding overengineering at the start since it sounds as if you're not sure what you're up against yet.

I haven't bothered with making LTO work fully, in order to also get a working executable since I don't have such a setup on this machine ATM.

All the other requirements (e.g., Graphviz output, demangling) can be hooked up with further custom targets/commands.

Other solutions might be:

  1. gllvm
  2. for the desperate llvm-ir-cmake-utils
compor
  • 2,239
  • 1
  • 19
  • 29
  • Yup that works :) Although I'm starting to think merging the llvm asm files was a bad plan... The graph is a little large lol. Any idea how to run opt once for each object file? – David Ledger Nov 24 '20 at 14:11
  • 1
    The `COMMAND` can be an external script (bash/python) that just takes that list and generates the .dot files. The problem with generator expressions is that they are not evaluated till generation time in CMake and not in a `foreach` context. If you wanna trigger regeneration based on what object/bitcode file is recompiled, things get tricky, hence why I wrote my CMake-based project back then, but I'd strongly recommend avoiding overengineering at the start since it sounds as if you're not sure what you're up against yet. – compor Nov 25 '20 at 06:17
  • Looks like I was able to do what I wanted with an external script pretty easily. What did you mean by subvert cmake? – David Ledger Nov 27 '20 at 00:21
  • 1
    In CMake, there are preset ways that describe how a toolchain is to be invoked (compile, link, etc.). So trying to extract the intermediate artifacts can be challenging at times. I'll add my previous comment and this to the answer. – compor Nov 27 '20 at 07:38