8

I have a C++ program that I'm building with Clang 3.9's profile-guided optimization feature. Here's what's supposed to happen:

  1. I build the program with instrumentation enabled.
  2. I run that program, creating a file with profile-data: prof.raw.
  3. I use llvm-profdata to convert prof.raw to a new file, prof.data.
  4. I create a new build of that same program, with a few changes:
    • When compiling each .cpp file to a .o file, I use the compiler flag -fprofile-use=prof.data.
    • When linking the executable, I also specify -fprofile-use.

I have a Gnu Makefile for this, and it works great. My problem arises now that I'm trying to port that Makefile to CMake (3.7, but I could upgrade ). I need the solution to work with (at least) the Unix Makefiles generator, but ideally it would work for all generators.

In CMake, I've defined two executable targets: foo-gen and foo-use:

  • When foo-gen is executed, it creates the prof.raw file.
  • I use add_custom_command to create a rule to create prof.data from prof.raw.

My problem is that I can't figure out how to tell CMake that each of the object files depended upon by foo-use has a dependency on the file prof.data.

  • The most-promising idea I had was to (1) find a way to enumerate all of the .o files upon which foo-use depenends, and then (2) iterate over each of those .o files, calling add_dependency for each one.

    The problem with this approach is I can't find an idiomatic way, in my CMakeLists.txt file, to enumerate the list of object files upon which an executable depends. This might be an open problem with CMake.

  • I also considered using set_source_files_properties to set the OBJECT_DEPENDS property on each of my .cpp files used by foo-use, adding prof.data to that property's list.

    The problem with this (AFAICT) is that each of my .cpp files is used to create two different .o files: one for foo-gen and one for foo-use. I want the .o files that get linked into foo-use to have this compile-time dependency on prof.data; but the .o files that get linked into foo-gen must not have a compile-time dependency on prof.data.

    And AFAIK, set_source_files_properties doesn't let me set the OBJECT_DEPENDS property to have one of two values, contingent on whether foo-gen or foo-use is the current target of interest.

Any suggestions for a clean(ish) way to make this work?

starball
  • 20,030
  • 7
  • 43
  • 238
Christian Convey
  • 1,202
  • 11
  • 19
  • Can you please add a [mcve]? Some CMake code would help to suggest something. – Florian Feb 09 '17 at 14:54
  • @Florian, It is hard to add example code because cmake simple doesn't seem to have any sane way to do this in a single project. I'm trying to solve exact same issue and simplest solution feels like using generator expression in OBJECT_DEPENDS (not supported). I don't think one can use $ for add_dependencies which seems to be only sane way to access object list. Potential other solution a) double project system where main user invoked project forwards settings to second pgo project compiling same settings again. b) Just replace build based system with shell script. – Pauli Nieminen Jun 03 '20 at 02:04
  • Does [this answer](https://stackoverflow.com/questions/45267352/cmake-compile-a-program-twice-in-a-row) your question? If it does, it's a shame because this question is slightly older and better written. – starball Aug 19 '22 at 08:32
  • update: I believe [the accepted answer in that question](https://stackoverflow.com/a/45268804/11107541) contains the very problem this question is trying to solve: it doesn't make each source file re-compile if the training profile data has changed. – starball Sep 01 '22 at 18:41

1 Answers1

1

Discussion on author's idea #1

The most-promising idea I had was to (1) find a way to enumerate all of the .o files upon which foo-use depenends, and then (2) iterate over each of those .o files, calling add_dependency for each one.

This shouldn't work according to the documentation for add_dependencies, which states:

Makes a top-level depend on other top-level targets to ensure that they build before does.

Ie. You can't use it to make a target depend on files- only on other targets.

Discussion on author's idea #2

I also considered using set_source_files_properties to set the OBJECT_DEPENDS property on each of my .cpp files used by foo-use, adding prof.data to that property's list.

The problem with this (AFAICT) is that each of my .cpp files is used to create two different .o files: one for foo-gen and one for foo-use. I want the .o files that get linked into foo-use to have this compile-time dependency on prof.data; but the .o files that get linked into foo-gen must not have a compile-time dependency on prof.data.

And AFAIK, set_source_files_properties doesn't let me set the OBJECT_DEPENDS property to have one of two values, contingent on whether foo-gen or foo-use is the current target of interest.

In the comment section, you mentioned that you could solve this if OBJECT_DEPENDS supported generator expressions, but it doesn't. As a side note, there is an issue ticket tracking this on the CMake gitlab repo. You can go give it a thumbs up and describe your use-case for their reference.

In the comments section you also mentioned a possible solution to this:

Potential other solution a) double project system where main user invoked project forwards settings to second pgo project compiling same settings again.

You can actually put this into the CMake project via ExternalProject so that it becomes part of the generated buildsystem: Make the top-level project include itself as an external project. The external project can be passed a cache variable to configure it to be the -gen version, and the top-level can be the -use version.

Speaking from experience, this is a whole other rabbit hole of long CMake-documentation-reading and finicking sessions if you have never manually invoked or done anything with ExternalProject before, so that answer might belong with a new question dedicated to it.

This can solve the problem of not having generator expressions in OBJECT_DEPENDS, but if you want to have multi-config for the top-level project and that some of the configs in the multi-config config not be for PGO, then you will be back to square one.

Proposed Solution

Here's what I've found works to make sources re-compile when profile data changes:

  1. To the custom command which runs the training executable and produces and re-formats the training data, add another COMMAND which produces a c++ header file containing a timestamp in a comment.
  2. Include that header in all sources which you want to re-compile if the training has been re-run.

If you want to support non-PGO builds, wrap the timestamp header in a header which checks that it exists with __has_include and only includes it if it exists.

I'm pretty sure with this approach, CMake doesn't do the dependency checking of TUs on the profile data, and instead, it's the generated buildsystem's header-dependency tracking which does that work. The rationale for including a timestamp comment in the header file instead of just "touch"ing it to change the timestamp in the filesystem is that the generated buildsystem might detect changes by file contents instead of by the filesystem timestamp.

All the shortcomings of the proposed solution

The painfully obvious weakness of this approach is that you need to add a header include to all the .cpp files that you want to check for re-compilation. There are several problems that can spawn from this (from least to most egregious):

  1. You might not like it from an aesthetics standpoint.

  2. It certainly opens up a hole for human-error in forgetting to include the header for new .cpp files. I don't know how to solve that. Some compilers have a flag that you can use to include a file in every source file, such as GCC's -include flag and MSVC's /FI flag. You can then just add this flag to a CMake target using target_compile_options(<target> PRIVATE "SHELL:-include <path>")

  3. You might not be able to change some of the sources that you need to re-compile, such as sources from third-party static libraries that your library depends on. There may be workarounds if you're using ExternalProject by doing something with the patch step, but I don't know.

For my personal project, #1 and #2 are acceptable, and #3 happens to not be an issue. You can take a look at how I'm doing things there if you're interested.

Toward a standard PGO CMake module

See https://gitlab.kitware.com/cmake/cmake/-/issues/19273

starball
  • 20,030
  • 7
  • 43
  • 238