9

I'm new in my current company and working on a project written by my direct team lead. The company usually doesn't work with C++ but there is productive code written by my coworker in C/C++. It's just us who know how to code in C++ (me and my lead, so no 3rd opinion that can be involved).

After I got enough insight of the project I realized the whole structure is... special.

It actually consist of a single compilation unit where the makefile lists as only source the main.hpp.

This headerfile then includes all the source files the project consists off, so it looks like a really big list of this:

#include "foo.cpp"
#include "bar.cpp"

While trying to understand the logic behind it and I realized that this is indeed working for this project, as it's just an interface where each unit can operate without accessing any other unit, at some point I asked him what are the reasons why he did it this way.

I got a defensive reaction with the argument

Well, it is working, isn't it? You are free to do it your way if you think that's better for you.

And that's what I'm doing now, simply because I'm really having trouble with thinking into this structure. So for now I'm applying the "usual" structure to the implementation I'm writing right now, while doing only mandatory changes to the whole project, to demonstrate how I would have designed it.

I think there are a lot of drawbacks, starting with mixing linkers and compilers jobs by own project structure can't serve well, up to optimizations that will probably end in redundant or obscure results, not to mention that a clean build of the project takes ~30 minutes, what I think might be caused by the structure as well. But I'm lacking the knowledge to name real and not just hypothetical issues with this.

And as his argument "It works my way, doesn't it?" holds true, I would like to be able to explain to him why it's a bad idea anyway, rather than coming over as the new nit picky guy.

So what problems could actually be caused by such a project structure? Or am I overreacting and such a structure is totally fine?

Alina Cretu
  • 103
  • 3
dhein
  • 6,431
  • 4
  • 42
  • 74
  • 3
    IMHO - Bail. Working with people like this will be a thing of nightmares. – UKMonkey Sep 20 '17 at 10:30
  • 6
    Having only one compilation unit means that you have to re-compile everything anytime you do even a slightest change. Well-separated structure with many units helps with greatly reducing compilation time. – AMA Sep 20 '17 at 10:37
  • Does the big header file have include guards? – Griffin Sep 20 '17 at 10:40
  • 2
    I know that some projects put all the source into the same unit to enable additional compilation optimizations and squeeze some extra-performance. However this is only a part of preparing for a release. The development is still done with a proper multi-unit structure. – AMA Sep 20 '17 at 10:40
  • @Griffin: nope. I mean its not supposed to be included anywhere at all. the only point where the main.hpp is getting refered to is in the makefile as the one and only ___source___-file. – dhein Sep 20 '17 at 10:43
  • 1
    Oh right, just one big fat file then. How big a file is it? Now I understand your question is to find a good argument for having multiple compilation units, just trying to understand the situation. – Griffin Sep 20 '17 at 10:48
  • @dhein have you searched for 'but we've always done it that way'? The correct reply from him would be to explain WHY they do it that way; and why the cost of extra compilation time is worth it. If they can't do that then you gain nothing, and they don't know why they're doing what they're doing. People who do things without knowing why they're doing it in any engineering disipline are a liability. – UKMonkey Sep 20 '17 at 10:50
  • @Griffin: Well the main consits of ~100 forward declarations (which seem obsolete to me) and after that for each of them the corresponding include. But for some of them this is taken down into additional layers, as some of thoose `.cpp`'s contain other `.hpp`'s which contain more includes by them self. – dhein Sep 20 '17 at 10:58
  • @UKMonkey: Well this coworker is easily offended, I think thats the problem rather as him not knowing what he does. As said no one else here is coding in C/C++ and he's a reasonable person so I think when I come over with canonical reasons, he will be open for this change. – dhein Sep 20 '17 at 11:01
  • 3
    @dhein I would say the best argument is the compilation time (as mentioned in answers also [this](https://stackoverflow.com/a/1686421/1981061)) and second would be easier to read which is usually underrated among programmers. – Griffin Sep 20 '17 at 11:04
  • [SQLite](http://sqlite.org/) is doing that. They call it "amalgamation" and that famous library is distributed as a single source – Basile Starynkevitch Sep 20 '17 at 11:36
  • 1
    @BasileStarynkevitch What they are doing is somewhat different. First - it is a C project. Second - they [keep it as bunch of regular .c files](https://www.sqlite.org/src/dir?ci=4b3f7eacb862fbb5&name=src) but then merge them (and some generated sources) into a single huge file. I guess this really simplifies live for end developers as they don't need to mess with generators and fancy projects. However such merging would be pointless for an application or dll development. Debugging a single huge file and then figuring out code origin and modifying real source file would be troublesome. – user7860670 Sep 20 '17 at 19:10

2 Answers2

5

not to mention that a (clean) build of the project takes ~30 minutes

The main drawback is that a change to any part of the code will require the entire program to be recompiled from the scratch. If the compilation takes a minute, this would probably not be significant, but if it takes 30 min, it's going to be painful; it destroys the make a change->compile->test workflow.

not to mention that a clean build of the project takes ~30 minutes

Having separate translation units is actually typically a quite a bit slower to compile from scratch, but you only need to recompile each unit separately when they're changed, which is the main advantage. Of course, it is easy to mistakenly destroy this advantage by including a massive, often changing header in all translation units. Separate translation units take a bit of care to do it right.

These days, with multi core CPU's, the slower build from scratch is mitigated by parallelism that multiple translation units allow (perhaps the disadvantage may even be overcome if the size of individual translation units happen to hit a sweet spot, and there are enough cores; you'll need some thorough research to find out).

Another potential drawback is that the entire compilation process must fit in memory. This is only a problem when that memory becomes more than the free memory on your developers workstations.

In conclusion: The problem is that one-massive-source-file approach does not scale well to big projects.


Now, a word about advantages, for fairness

up to optimizations will probably end in redundant or obscure results

Actually, the single translation unit is easier to optimize than separate ones. This is because some optimizations, inline expansion in particular, are not possible across translation units because they depend on the definitions that are not in the currently compiled translation unit.

This optimization advantage has been mitigated since link time optimization has been available in stable releases of popular compilers. As long as you're able and willing to use a modern compiler, and to enable link time optimization (which might not be enabled by default)


PS. It's very un-conventional to name the single source file with extension .hpp.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Interesting point about memory, haven't considered that yet. About the only source file extension being `.hpp` I find it even more un-conventional that this __source__ file actually even looks like a header and not a sourcefile. I just realized that its the case, when doing changes to the makefile. – dhein Sep 20 '17 at 11:06
  • @dhein well, that's just a consequence of this method. The single source file is just a tool for the compilation process, not something that should include program logic. – eerorika Sep 20 '17 at 11:09
  • 2
    Actually if you have a multicore machine you can use make -jN or ninja to compile several compilation units in parallel, Visual Studio will parallelize at least separate projects, even without distributed build, so I would not restrict this advantage to distributed compilation farms. – PaulR Sep 20 '17 at 11:46
  • @PaulR I was imagining that a compiler could share its load across multiple cores when compiling a single compilation unit as well. Perhaps I was wrong. – eerorika Sep 20 '17 at 12:16
  • For Visual C++ I think you may be right, if I remember my CPU load during compilation correctly (but that may have only applied during link time optimization), but I do not think gcc and clang use multiple threads (again possibly except for some forms of LTO). See also https://softwareengineering.stackexchange.com/questions/322494/do-compilers-utilize-multithreading-for-faster-compile-times – PaulR Sep 20 '17 at 14:04
  • `.hpp` file extension *is* somewhat conventional/common for this pattern, which, while not the most wide-spread or scaleable solution, is relatively popular among a minority C++ programmers, as I understand it. Popular enough that I was exposed to it within the first couple of years of my Computer Science degree, at least - I don't claim to be an expert on C++ demographics. Some developers find it easier to reason about - and the reply to that is much like the crux of this answer - it can be advantageous for reasoning about or organizing some projects up to a point, but it doesn't scale well. – mtraceur Dec 21 '17 at 11:38
3

First thing I would like to mention are advantages of project with Single Compilation Unit:

  • Drastic compilation time reduction. This is actually one of the primary reasons to switch to SCU. When you have a regular project with n translation units compilation time will grow ~ linearly with each new translation unit added. While with SCU it will grow ~ logarithmically and adding new units to large projects hardly effects compilation time.
  • Compilation memory reduction. Both disc and RAM. "big" translation unit will obviously occupy considerably more memory than each individual "small" translation unit containing only part of the project, however their cumulative size will greatly exceed size of the "big" translation unit.
  • Some optimization benefits. Obviously you get "everything is inline" automatically.
  • No more fear of "compilation from scratch". This is very important because it is what CI server performs.

Now to disadvantages:

  • Developers must maintain strict header organization discipline. Header guards, consistent ordering of #include directives, mandatory inclusion of headers directly required by current file, proper forwarding, consistent naming conventions etc. The problem is that there are no tools to help developers with this and even minor violations in header organization may lead to messy failed build logs.
  • Potential increase of total amount of files in project. See this answer for details.
  • No more "it's compiling" excuses for wooden sword fencing.

P.S.

  • The SCU organization in your case is kind of "soft-boiled". By that I mean that project still has translation units that are not proper headers. Typically such scenario happens when an old project being converted to SCU.
  • If building your project with SCU takes ~30 minutes, I have a feeling that it is either not fault of project organization (it could be antivirus, no SSD, recursive templates bloat) or it would take several hours without SCU.
  • some numbers: compilation time dropped from ~14 minutes to ~20 seconds, 3x executable size reduction (result of converting existing project to SCU from my experience)
  • real-world use cases: CppCon 2014: Nicolas Fleury "C++ in Huge AAA Games", Chromium Jumbo / Unity builds ("is can save hours for a full build")
  • I might a bit exaggerating, but is seem to me that entire concept of "multiple translation units" (and static libraries as well) should be left in the past.
user7860670
  • 35,849
  • 4
  • 58
  • 84
  • "of such project structure" Which one you mean? As you say time reduction and then explain that its log isntead of linear, what isn't an reduction AFAIK. nice answer, but I would love if you make more clear where you refer to what :) – dhein Sep 20 '17 at 12:15
  • Thanks, nice input and very interesting. But I'm not sure if you are talking about SCU vs SCU in general or for the specific case I described? Or IS what he did actually the correct concept of SCU? If it isn't, could you maybe go a bit into detail about possible issues of wrong implemented SCU? As while this is currently interesting, it doesn't help me to find reasons against the structure in place (not SCU in general) – dhein Sep 20 '17 at 12:25
  • @dhein I'm talking about SCU in general. But I've referenced the kind of project that you've got (that is compilation of single file that includes huge list of `.cpp` files) as "soft-boiled" because it is a bit of "cutting corners" approach. Maintaining huge lists seems to be quite a burden. The proper SCU (as I see it) implies writing absolutely everything as a *header-only library* (that is compilation of the single file that only includes an "application header-only library" header with entry point, no "huge lists" of any kind). – user7860670 Sep 20 '17 at 12:37
  • 3
    "entire concept of "multiple translation units" (and static libraries as well) should be left in the past" - this is a very strong statement. It should not follow from the provided argumentation/cases. Yes, mono-builds have certain advantages. I find this kind of generalization harmful. – AMA Sep 20 '17 at 14:36
  • 2
    @dhein A good example to consider on this topic is the `sqlite` project. They *release* the code as one big `.c` file - but if you browse the code as it's being developer and version-controlled, it's properly organized into a separate `.c` files. – mtraceur Dec 21 '17 at 11:45