199

CMake offers several ways to specify the source files for a target. One is to use globbing (documentation), for example:

FILE(GLOB MY_SRCS dir/*)

Another method is to specify each file individually.

Which way is preferred? Globbing seems easy, but I heard it has some downsides.

Kevin
  • 16,549
  • 8
  • 60
  • 74
Marenz
  • 2,722
  • 3
  • 20
  • 19
  • One aspect I don't see discussed in any current answers is the relation to deterministic/reproducible builds. See [this section of a Conan article on the subject](https://blog.conan.io/2019/09/02/Deterministic-builds-with-C-C++.html#file-order-feeding-to-the-build-system). [`file(GLOB)`](https://cmake.org/cmake/help/latest/command/file.html#glob) doesn't have a specified order until v3.6, where it is specified to be lexicographical. I wish I could say today that everyone is past v3.6, but I can't. On the bright (?) side, some people have never even heard of deterministic/reproducible builds. – starball Oct 21 '22 at 04:19

7 Answers7

226

Full disclosure: I originally preferred the globbing approach for its simplicity, but over the years I have come to recognise that explicitly listing the files is less error-prone for large, multi-developer projects.

Original answer:


The advantages to globbing are:

  • It's easy to add new files as they are only listed in one place: on disk. Not globbing creates duplication.

  • Your CMakeLists.txt file will be shorter. This is a big plus if you have lots of files. Not globbing causes you to lose the CMake logic amongst huge lists of files.

The advantages of using hardcoded file lists are:

  • CMake will track the dependencies of a new file on disk correctly - if we use glob then files not globbed first time round when you ran CMake will not get picked up

  • You ensure that only files you want are added. Globbing may pick up stray files that you do not want.

In order to work around the first issue, you can simply "touch" the CMakeLists.txt that does the glob, either by using the touch command or by writing the file with no changes. This will force CMake to re-run and pick up the new file.

To fix the second problem you can organize your code carefully into directories, which is what you probably do anyway. In the worst case, you can use the list(REMOVE_ITEM) command to clean up the globbed list of files:

file(GLOB to_remove file_to_remove.cpp)
list(REMOVE_ITEM list ${to_remove})

The only real situation where this can bite you is if you are using something like git-bisect to try older versions of your code in the same build directory. In that case, you may have to clean and compile more than necessary to ensure you get the right files in the list. This is such a corner case, and one where you already are on your toes, that it isn't really an issue.

Kevin
  • 16,549
  • 8
  • 60
  • 74
richq
  • 55,548
  • 20
  • 150
  • 144
  • 1
    Also bad with globbing: git's difftool files are stored as $basename.$ext.$type.$pid.$ext which can cause fun errors when trying to compile after a single merge resolution. – mathstuf Mar 04 '13 at 21:43
  • 12
    I think this answer glosses over the drawbacks of cmake missing new files, `Simply "touch" the CMakeLists.txt` is OK if you are the developer, but for others building your software it can really be a pain-point that your build fails after updating and the burden is on **them** to investigate why. – ideasman42 Aug 30 '13 at 17:38
  • `Not globbing creates duplication` Not true if you let CMake check if the file exists on disk, and have it generate the ones that don't. You can even use different templates based on the file name and extension. The meaning of `;` in CMake makes it not fun to work with C++ templates though. And it does cause CMake to take longer. Oh, and it should be switched off on the continuous integration server. But it's still better than creating files manually (or worse, copy-paste) in my opinion. – Andreas Haferburg May 06 '15 at 22:31
  • 52
    You know what? Since writing this answer **6 year ago**, I've changed my mind a bit and now prefer to explicitly list files. It's only real disadvantage is "it's a bit more work to add a file", but it saves you all sorts of headaches. And in a lot of ways explicit is better than implicit. – richq May 07 '15 at 12:02
  • 2
    @richq Would [this git hook](http://stackoverflow.com/a/17838951/2436175) make you reconsider your current position? :) – Antonio Jun 04 '15 at 13:00
  • The git hook isn't a reliable way to re-run CMake. see comments: http://stackoverflow.com/a/17838951/432509 – ideasman42 Jun 06 '15 at 04:35
  • @richq would you please update your answer - with an edit mark or so, because implicit issues really beat us again and again. – Fei Jan 26 '16 at 04:56
  • 1
    @ruslo I don't think it would be fair to the 100+ people that upvoted the answer as it was. – Antonio Apr 26 '16 at 10:21
  • 1
    @ruslo The subject is still disputed, and pros and cons are clearly mentioned in this webpage. – Antonio Apr 26 '16 at 11:09
  • 13
    Well as Antonio says, the votes were given for advocating the "globbing" approach. Changing the nature of the answer is a bait-and-switch thing to do for those voters. As a compromise I've added an edit to reflect my changed opinion. I apologise to the internet for causing such a storm in a teacup :-P – richq Apr 26 '16 at 11:38
  • 3
    I find that the best compromise is to have a tiny external script (e.g. python) that updates the CMakeLists.txt file with the sources. It combines the benefits of having files added for you automatically (so prevents potential typos in file paths), but produces an explicit manifest that can be checked into source control (so deltas are obvious/tracked and git-bisect will work), allows inspection to debug issues when spurious or unintended files were included, and properly triggers cmake to know when it should rerun. – Grayson Lang Jul 16 '17 at 05:19
  • 3
    There is one more disadvantage to globbing. The files end up as absolute paths, so if you want to use the file names (with leading relative path) to reproduce a folder structure somewhere else as part of your build, suddenly those leading paths are absolute. – 0xC0000022L Jul 30 '19 at 08:51
  • 1
    this dogma from the cpp community is the reason why this language is so far behind and why the build systems will not reach their full potentials....! – hiradyazdan May 03 '21 at 12:14
  • +1 for the git-bisect comment, the first sane argument against globbing. Still, I prefer to run a script that detects changes in directory contents with specific file endings and touches CMakeLists.txt files automatically. Normally I do not even care for small to medium sized projects, but after your bisect comment I will. – Patrick Fromberg Aug 14 '23 at 10:07
  • @hiradyazdan, I totally agree. But even worse, it should be mandatory to do it one way or the other, even if the consensus were against my liking . Build systems are the one thing that do not have to be backward compatible, so we had a chance to get that right from start. Committee believes not their business. Build system devs believe they need to support any solution that people invented out of despair not having such a system and we have to learn how to specify all that nonsense that should never have come into existence, and then we have to learn just that for 3 or 4 build systems. – Patrick Fromberg Aug 14 '23 at 10:42
  • I cannot resist making a prophecy. We will make our tools and libraries so inconvenient to use, that only AI will use them in the future. After all, what would be the purpose of using copilot, if libraries and tools where perfect?. The purpose of a prompt is to be able to say, I need following but have no means to express it succinctly. The AI will not care if globbing or not. It will not care if it has to write hundred or one million lines of code. – Patrick Fromberg Aug 14 '23 at 10:59
143

The best way to specify sourcefiles in CMake is by listing them explicitly.

The creators of CMake themselves advise not to use globbing.

See: Filesystem

(We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate.)

Of course, you might want to know what the downsides are - read on!


When Globbing Fails:

The big disadvantage to globbing is that creating/deleting files won't automatically update the build system.

If you are the person adding the files, this may seem an acceptable trade-off. However, this causes problems for other people building your code; they update the project from version control, run build, and then contact you, complaining that
"the build's broken".

To make matters worse, the failure typically gives some linking error which doesn't give any hints to the cause of the problem and time is lost troubleshooting it.

In a project I worked on, we started off globbing, but we got so many complaints when new files were added that it was enough reason to explicitly list files instead of globbing.

This also breaks common Git workflows
(git bisect and switching between feature branches).

So I couldn't recommend this. The problems it causes far outweigh the convenience. When someone can't build your software because of this, they may lose a lot of time to track down the issue or just give up.

And another note. Just remembering to touch CMakeLists.txt isn't always enough. With automated builds that use globbing, I had to run cmake before every build since files might have been added/removed since last building *.

Exceptions to the rule:

There are times where globbing is preferable:

  • For setting up a CMakeLists.txt files for existing projects that don't use CMake.
    it’s a fast way to get all the source referenced (once the build system's running - replace globbing with explicit file-lists).
  • When CMake isn't used as the primary build system, if for example you're using a project who aren't using CMake, and you would like to maintain your own build-system for it.
  • For any situation where the file list changes so often that it becomes impractical to maintain. In this case, it could be useful, but then you have to accept running cmake to generate build-files every time to get a reliable/correct build (which goes against the intention of CMake - the ability to split configuration from building).

*Yes, I could have written a code to compare the tree of files on disk before and after an update, but this is not such a nice workaround and something better left up to the build system.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • 12
    "The big disadvantage to globbing is that creating new files won't automatically update the build-system." But isn't it true that if you don't glob, you still have to manually update CMakeLists.txt, meaning cmake is still not automatically updating the build system? It seems like either way you must remember to manually do something in order for the new files to build. Touching CMakeLists.txt seems easier than opening it up and editing it to add the new file. – dafalcon Jan 09 '14 at 14:35
  • 23
    @Dan, for **your** system - sure, if you only develop alone this is fine, but what about everyone else who builds your project? are you going to email them to go and manually touch the CMake file? every time a file is added or removed? - Storing the file list in CMake ensures the build is always using the same files vcs knows about. Believe me - this is not just some subtle detail - When your build fails for many devs - they mail lists and ask on IRC that the code is broken. Note: *(Even on your own system you may go back in git history for eg, and not think to go in and touch CMake files)* – ideasman42 Jan 09 '14 at 14:52
  • 2
    Ah I had not thought of that case. That is the best reason I've heard against globbing. I wish the cmake docs expanded on why they recommend people avoid globbing. – dafalcon Jan 09 '14 at 22:20
  • @ideasman42 We got around the problem with [a git hook](http://stackoverflow.com/a/17838951/2436175) – Antonio Jun 04 '15 at 12:55
  • @Antonio, git hooks work in many, but not all cases (applying a patch manually, or un-stashing a change). For me, working **reliably**, trumps working **sometimes**, of course YMMV. – ideasman42 Aug 02 '15 at 04:04
  • 1
    I've been thinking about solution of writing timestamp of last cmake execution into file. The only problems are: 1) it probably has to be done by cmake to be crossplatform and so we need to avoid cmake running itself second time somehow. 2) Possibly more merge conflicts (which still happen with file list btw) They could actually be resolved trivially in this case by taking later timestamp. – Predelnik Feb 16 '17 at 11:32
  • Gotta say, it's probably not neat enough for some people, but much less hassle (and error prone) than maintaining a list of files in CMakeLists.txt is to just make an insignificant change to the file, such as adding a space to the end (or removing it from last time). When others check it out, CMake will then automaticlaly rebuild and reglob. But it would be nice if CMake created a filetree_updated file that you could check in, which it would automatically change each time the glob of files updated. – Tim MB Apr 17 '17 at 19:38
  • 1
    @tim-mb, in practice it's likely developers would forget to touch/modify this file. Once a project is setup, how many files do you add/remove each day anyway? I think this is mainly something you run into when *moving* to CMake. Once you're using CMake, the effort to update the list of files is very small in my experience. In the case you add 10+ files you just copy-paste from a `git-status` or output of `find`. – ideasman42 Apr 18 '17 at 08:47
  • Fair enough. My process is to create files and then touch the cmake file to get cmake to load them into the IDE so I never really forget, and were I to miss the files then the linker would fail so it's not really a biggie. I guess it also depends on whether your coding style involves creating a lot of files. – Tim MB Apr 19 '17 at 10:05
  • 1
    @tim-mb as mentioned in the answer, this is only really practical if you work alone, and even then it will cause problems when bisecting or switching branches. – ideasman42 Apr 19 '17 at 11:31
  • 2
    @tim-mb, "But it would be nice if CMake created a filetree_updated file that you could check in, which it would automatically change each time the glob of files updated." - you have just exactly described what my answer does. – Glen Knowles Oct 15 '17 at 08:19
  • Thanks. So 'listing them explicitly' is the recommended way. How to do that in CMakeLists.txt ? Thanks a lot. – toto_tata Jul 01 '21 at 08:35
  • You have to manually maintain a list. Typically something like `set(SRC src/some.c src/file.c src/paths.h)` for larger projects it's common to list one file per line. – ideasman42 Jul 02 '21 at 01:52
  • Explicitly listing all files does not sound right. I just wondering what stops CMAKE to have a better error message than "the build's broken". From my point of view. The error message is broken. It will be better to find a way to have better error messages (e.g have a script to scan files name and compare cached to files) than suggest developers manually add files to a build system, it is 2022 already not 0222 – r0n9 Nov 26 '22 at 21:31
  • @r0n9 this is how CMake works, even if you think it shouldn't. The error message is generally from the compiler or linker, so it's out of CMake's control and will vary depending on the compiler but often relates to missing symbols (as files are missing from the resulting built). That you think this is wrong or bad in some way is understandable. Nevertheless, this is the current state of CMake, you are free to write your own tools to detect these cases but chances are others wont use them. Other build systems such as Meson also discourage this: https://stackoverflow.com/a/49014401/432509 – ideasman42 Nov 28 '22 at 06:19
67

In CMake 3.12, the file(GLOB ...) and file(GLOB_RECURSE ...) commands gained a CONFIGURE_DEPENDS option which reruns cmake if the glob's value changes. As that was the primary disadvantage of globbing for source files, it is now okay to do so:

# Whenever this glob's value changes, cmake will rerun and update the build with the
# new/removed files.
file(GLOB_RECURSE sources CONFIGURE_DEPENDS "*.cpp")

add_executable(my_target ${sources})

However, some people still recommend avoiding globbing for sources. Indeed, the documentation states:

We do not recommend using GLOB to collect a list of source files from your source tree. ... The CONFIGURE_DEPENDS flag may not work reliably on all generators, or if a new generator is added in the future that cannot support it, projects using it will be stuck. Even if CONFIGURE_DEPENDS works reliably, there is still a cost to perform the check on every rebuild.

Personally, I consider the benefits of not having to manually manage the source file list to outweigh the possible drawbacks. If you do have to switch back to manually listed files, this can be easily achieved by just printing the globbed source list and pasting it back in.

starball
  • 20,030
  • 7
  • 43
  • 238
Justin
  • 24,288
  • 12
  • 92
  • 142
  • 1
    If your build system performs a complete cmake and build cycle (delete the build directory, run cmake from there and then invoke the makefile), provided they don't pull in unwanted files, surely there are no drawbacks to using GLOBbed sources? In my experience the cmake part runs much more quickly than the build, so it's not that much of an overhead anyway – Den-Jason Mar 03 '20 at 14:33
  • 7
    The real, well-thought-out answer, always nearing the bottom of the page. Those who would rather keep updating the file fail to notice that the actual loss of efficiency isn't in the order of seconds as the file gets manually updated, or literally nanoseconds in performing the check, but possibly _days or weeks_ accumulated as the programmer loses their flow while messing with the file, or postpones their work altogether, just because they don't want to update it. Thank You for this answer, truly a service to humanity... And thanks to the CMake folks for finally patching it up! :) – S. Exchange Considered Harmful Aug 02 '20 at 17:33
  • 1
    "possibly days or weeks accumulated as the programmer loses their flow while messing with the file" -- Citation needed – Alex Reinking Jan 31 '21 at 11:26
  • @AlexReinking Don't forget to @ the user you are replying to. – Justin Jan 31 '21 at 21:35
  • 1
    @S.ExchangeConsideredHarmful I have seen much so much time lost on broken build because of globbing that you would need to do millions of updates to the file list to make up for it. It does not happen *frequently*, but **1)** every occurence immediately incurs bug reports and headaches, and **2)** the issue is frequent enough that people working on those projects, whenever something does not build, now always start by clearing cmake cache and rebuilding. This habit wastes hours per week, and it is a direct consequence of the use of file(GLOB). – spectras Feb 10 '21 at 16:39
  • Seems like this would also solve the problem mentioned in [@ideasmans42's answer](https://stackoverflow.com/a/18538444/751579) by properly regenerating the build files when changing branches, bisecting, etc. No? – davidbak May 04 '21 at 16:06
  • @davidbak I don't fully understand ideasmans42's case, but this should solve it. CMake inserts commands into the generated build system to re-evaluate this glob before you build every time you build, re-running `cmake` if the glob doesn't match with the last time CMake configured the project. So changing branches, bisecting, etc. change the files, which changes the glob's evaluation, which should force CMake to re-configure – Justin May 04 '21 at 16:14
  • 1
    IDEs like Clion already manage updates to `CMakeLists.txt`. Given this, and the fact that cmake's own maintainers recommend against abusing GLOB to handle source files, I don't understand why some people continue pushing antipatterns. – RAM Dec 20 '21 at 09:43
  • @RAM because it's DUMB to list every single file? Even when cmake is doing it it is still making huge mess of unmaintainable text blob of files with paths in a `CMakeLists.txt` – Enerccio Sep 08 '22 at 05:49
  • @Enerccio listing exactly which file is included in your project is the opposite of dumb, given the whole point of a build system is to define and process exactly each and every target. Keep in mind that [cmake's own reference documentation](https://cmake.org/cmake/help/v3.0/command/file.html) recommends against this practice. Thus either you believe you know better than the people behind CMake or you're talking about things you know little about. – RAM Sep 09 '22 at 10:46
10

You can safely glob (and probably should) at the cost of an additional file to hold the dependencies.

Add functions like these somewhere:

# Compare the new contents with the existing file, if it exists and is the
# same we don't want to trigger a make by changing its timestamp.
function(update_file path content)
    set(old_content "")
    if(EXISTS "${path}")
        file(READ "${path}" old_content)
    endif()
    if(NOT old_content STREQUAL content)
        file(WRITE "${path}" "${content}")
    endif()
endfunction(update_file)

# Creates a file called CMakeDeps.cmake next to your CMakeLists.txt with
# the list of dependencies in it - this file should be treated as part of
# CMakeLists.txt (source controlled, etc.).
function(update_deps_file deps)
    set(deps_file "CMakeDeps.cmake")
    # Normalize the list so it's the same on every machine
    list(REMOVE_DUPLICATES deps)
    foreach(dep IN LISTS deps)
        file(RELATIVE_PATH rel_dep ${CMAKE_CURRENT_SOURCE_DIR} ${dep})
        list(APPEND rel_deps ${rel_dep})
    endforeach(dep)
    list(SORT rel_deps)
    # Update the deps file
    set(content "# generated by make process\nset(sources ${rel_deps})\n")
    update_file(${deps_file} "${content}")
    # Include the file so it's tracked as a generation dependency we don't
    # need the content.
    include(${deps_file})
endfunction(update_deps_file)

And then go globbing:

file(GLOB_RECURSE sources LIST_DIRECTORIES false *.h *.cpp)
update_deps_file("${sources}")
add_executable(test ${sources})

You're still carting around the explicit dependencies (and triggering all the automated builds!) like before, only it's in two files instead of one.

The only change in procedure is after you've created a new file. If you don't glob, the workflow is to modify CMakeLists.txt from inside Visual Studio and rebuild. If you do glob, you run cmake explicitly - or just touch CMakeLists.txt.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Glen Knowles
  • 438
  • 6
  • 8
  • 1
    At first I thought this was a tool that would automatically update the Makefiles when a source file is added, but I see now what its value is. Nice! This solves concern of someone updating from the repository and having `make` give strange linker errors. – Cris Luengo Jul 14 '17 at 22:12
  • 1
    I believe this could be a good method. One of course has still to remember to trigger cmake after adding or removing a file, and it is also require committing this dependency file, so some education on the user side is necessary. The major drawback could be that this dependency file could originate nasty merge conflicts which might be difficult to solve without again requiring the developer to have some understanding of the mechanism. – Antonio Oct 30 '17 at 16:09
  • 1
    This won't work if your project has conditionally included files (eg, some files which are only used when a feature is enabled, or only used for a particular operating-system). It's common enough for portable software that some files are only used for spesific platforms. – ideasman42 Jan 07 '18 at 04:24
2

Specify each file individually!

I use a conventional CMakeLists.txt and a Python script to update it. I run the python script manually after adding files.

See my answer at How to collect source files with CMake without globbing?.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
palfi
  • 288
  • 2
  • 6
0

I'm not a fan of globbing and never used it for my libraries. But recently I've looked a presentation by Robert Schumacher (vcpkg developer) where he recommends to treat all your library sources as separate components (for example, private sources (.cpp), public headers (.h), tests, examples - are all separate components) and use separate folders for all of them (similarly to how we use C++ namespaces for classes). In that case I think globbing makes sense, because it allows you to clearly express this components approach and stimulate other developers to follow it. For example, your library directory structure can be the following:

  • /include - for public headers
  • /src - for private headers and sources
  • /tests - for tests

You obviously want other developers to follow your convention (i.e., place public headers under /include and tests under /tests). file(glob) gives a hint for developers that all files from a directory have the same conceptual meaning and any files placed to this directory matching the regexp will also be treated in the same way (for example, installed during 'make install' if we speak about public headers).

-1

This might be a useful cog:

It's in powershell, but any other scripting language will do... It's just one possible addition to the stuff mentioned above.

Get a recursive list of the code files: $res=$( Get-ChildItem -Path $root -Recurse -Attributes !Directory -Name -Include *.h,*.c,CMakeLists.txt )

Concatenate each line / element from the returning Object[] into a single string and compute a hash for it. Store the hash in a file on the root (any) what you will query. Typically it's the components,main,etc folder(s). Each compile script will check a freshly computed hash against the stored one and in case of mismatch (there was a change in the file layout) a cmake reconfigure is required and naturally store the fresh hash (still melts a bit) then goto 10.

Hash from a string:

function stringhash {
    PARAM (
        [Parameter(Mandatory, Position = 0)]
        [string]
        $source)

    $stringAsStream = [System.IO.MemoryStream]::new()
    $writer = [System.IO.StreamWriter]::new($stringAsStream)
    $writer.write("$($source)")
    $writer.Flush()
    $stringAsStream.Position = 0
    $res = (Get-FileHash -InputStream $stringAsStream | Select-Object Hash)
    $writer.Close()
    $stringAsStream.Close()
    return $res.Hash.ToUpper()
}