1

I've got a C++ project that builds in several configurations (not just Debug/Release), and it's a rather massive project. Let's call it KitchenSink.vcproj

I suspect that many pieces of this project build identically, regardless of the configuration. E.g. Building with/without Unicode support will not matter for a source file that doesn't use strings.

This would lead to the same source file being compiled in multiple configurations, but generating (effectively) the same .obj file. It doesn't generate the same file, because timestamps and the like are embedded in the file, but all the functional pieces of the object file would be the same.

Is there a way to check this? I would like to extract pieces of KitchenSink to their own, simpler, projects, where they only need to be built once. This would speed up build times, and simplify our codebase. But I need a way to automatically find the parts of the code that build the same, regardless of configuration. Is there an easy way to do that?

EDIT: Clarifying my point. Imagine the following file:

// Some header file
int calculate_something(int a, int b);

// The source file
int calculate_something(int a, int b) {
    return a * b;
}

Now, that source file has nothing to do with Unicode. So, if we build it in a Unicode configuration, and then build it again with a MultiByte configuration, we're just wasting time. We could put it into its own static library, that's built without Unicode support, and then that new lib could be used by my other projects. There's nothing risky in that.

I just need to find these files that can be safely moved to a separate project.

EDIT: Further clarification:

KitchenSink.vcproj has the following files (among others)
    StringUtils.h
    StringUtils.cpp
    MathStuff.h
    MathStuff.cpp

Now, if you build KitchenSink in Unicode, and again in MultiByte, you will build StringUtils.obj twice, and MathStuff.obj twice. Obviously, this is necessary for StringUtils.obj, since it will be different in Unicode and MultiByte. But MathStuff.obj should build the exact same.

So, I would like to rearrange/restructure/refactor to the following:

KitchenSink.vcproj has the following files (among others)
    StringUtils.h
    StringUtils.cpp

NewProject.vcproj has the following files
    MathStuff.h
    MathStuff.cpp

Now, KitchenSink can be built in its multiple configurations, while NewProject can be built with just a single Debug/Release option.

Again, I'm NOT talking about SHARING obj files. I'm talking about removing cpp/h files from one project, and putting them in another.

Also note that Unicode/Multibyte is an example of a project with multiple configurations. The reality in my project is actually more complicated, so each source file is compiled 4 times, rather than the 2 that would occur with Unicode/Multibyte.

Tim
  • 8,912
  • 3
  • 39
  • 57
  • I +1ed this. However, I think you would do well to modify the question wording. Comparing object files really is quite ludicrous (especially given the goal; comparing symbols with external linkage would make some sense). People might not (a) understand the question readily (b) take it seriously – sehe Oct 25 '11 at 20:57
  • You wouldn't really need to move them to a separate project. I suppose with MSBuild, NAnt or nmake (others too) you should be perfectly able to specify which object files to link, and just link the same object files into several targets. – sehe Oct 25 '11 at 20:59
  • @sehe, we're trying to break this monstrous project into smaller, more manageable entities anyway. – Tim Oct 25 '11 at 21:02
  • in that case I suggest manual labour, perhaps facilitated by [LLVM dependency analysis](http://llvm.org/docs/Passes.html) or [CLang analysis](http://clang-analyzer.llvm.org). [CPPDepend](http://www.cppdepend.com/) does have feature here too. You can still use the techniques I sketched to detect whether a _translation unit_ varies per build configuration. – sehe Oct 25 '11 at 21:08

4 Answers4

2

On linux and Cygwin/MINGW there is ccache that helps for this (detecting identical preprocessed sources with identical compilation flags). SCons (a make replacement using python) can do the same.

For Visual Studio, I'm afraid you would be looking at IncrediBuild

That said, this answer lists some references to other candidates (projects in progress): Is there a Ccache for Visual Studio?

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • That wouldn't help at all. These are items built from the same source file, with different `/D` options, but with the same resulting functional binary object. – Tim Oct 25 '11 at 21:00
  • @Tim: That's why you check AFTER preprocessing, all `/D` macros are already substituted. – Ben Voigt Oct 25 '11 at 21:04
  • Hmm, based on your comment below, are you suggesting I only build the preprocesser phase of the build, and diff those files? That sounds like it might work. – Tim Oct 25 '11 at 21:05
  • @Tim: you could use such techniques to detect translation units that _do vary_ for build configs. _Then_ you can use the knowledge to chart the areas that could benefit from refactoring into smaller/different components – sehe Oct 25 '11 at 21:11
0

Well, you could use diff -b to compare .obj files but I think you'd be foolish to go down this route. All it would take is one small change of the source code to render your optimised build process invalid. I would not contemplate doing something like this.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • If you are going to be diffing anything, compare hash-sums? Rather, hash-sum the [precompiled source before compiling] + [compiler flags] – sehe Oct 25 '11 at 20:54
  • @sehe The question specifically mentions different compiler flags (MBCS/Unicode) but identical resulting .obj files. So I think you'd need to check the result of the compilation to know whether or not you could skip it. And that would have to be encoded in some alternative build script. And that would be hopelessly brittle. – David Heffernan Oct 25 '11 at 20:56
  • I'm not trying to on-the-fly change how I build stuff. I'm trying to identify safe targets to be moved to a different (single-configuration) project. There's nothing brittle nor risky about that. – Tim Oct 25 '11 at 20:59
  • @Tim It becomes brittle if you ever change the code of that translation unit. – David Heffernan Oct 25 '11 at 21:01
  • @DavidHeffernan: AFAICT the flags for MBCS/Unicode are _preprocessor_ flags, no compile flags. That's why I suggest hash-summing the precompiled text for quick win, and the pass the remaining flags to the compiler. (Note that the OP is explicitely avoiding the link phase for now, so I will follow that) – sehe Oct 25 '11 at 21:02
0

In Visual Studio, the main tool to inspect binary files generated by compiler/linker is the dumpbin.exe. Check the How to compare binary images of the same project builds

Be aware, the fact translation unit does not make use of strings, does not mean Unicode and non-Unicode builds will be the same. The Unicode build affects use of Windows API, so object files will be eventually linked to different symbols in system libraries.

Side note, the idea is of reusing .obj files between builds, especially between builds with even slightly different configuration, is very weak. There are many subtle issues possible. For example, considering exactly the same build configuration, in theory, a static library is equivalent to collection of .obj files. But, I have experienced very strange differencs and problems linking my application against static library vs linking against *.obj files of this library. Here is my thread in MSDN forums where you can find some details related to .obj files as drop-in replacement for static library.

mloskot
  • 37,086
  • 11
  • 109
  • 136
  • Object files aren't linked yet (you can link _them_ into a PE/COFF/... _linked_ image). They may refer to external symbols, but they will be identical if the source is identical. – sehe Oct 25 '11 at 21:03
  • @sehe: Doesn't matter, there will be symbol references with a particular name, ending in `A` or `W`. – Ben Voigt Oct 25 '11 at 21:04
  • @BenVoigt: but they will be identical if the source is identical. (I know that the source is not identical for Unicode vs ANSI; that was not the point) – sehe Oct 25 '11 at 21:09
  • @sehe Ben explained my point. You have to think of translation unit, not about single .cpp file. Think of comparing preprocessor output, not binaries: if you can guarantee that output of preprocessing x.cpp is the same in all your builds, then my guess is that you are fairly safe reusing x.obj. – mloskot Oct 25 '11 at 21:10
  • @mloskot: apparently I am spending more time reading your comments than you do mine. There was no need to 'explain' that to me :_) – sehe Oct 25 '11 at 21:13
  • @sehe ...I've just noticed your answer where you mention ccache. You confirm my concerns yourself there. ccache decideds based on preprocessor output about reusing .obj file. I can't see any other way. (I read yours, but there is a lack of linear synchronisation when you write and when I see and when I respond). – mloskot Oct 25 '11 at 21:14
  • I'm not trying to reuse an obj file. I understand the risks of doing something like that. I'm trying to pull things from one static library to another (simpler) one, and I'm trying to find files that can safely be moved over. – Tim Oct 25 '11 at 21:16
  • @mloskot: So you found out we agree :) My comment was merely pointed at (IMO) confusing usage of the word 'linked to'. _Refer to_ is ok for me – sehe Oct 25 '11 at 21:17
  • @sehe Understood. I've clarified my answer using "will be linked" :) – mloskot Oct 25 '11 at 21:18
  • @Tim What files you want to move over then? What you mean "pull things from static library"? This all gets confusing, I'm afraid – mloskot Oct 25 '11 at 21:21
  • @mloskot, I've updated my question. Hopefully you are no longer confused. – Tim Oct 25 '11 at 21:46
0

To see if a project is building the same file in different configurations, but pointlessly, it is best to compare the output of the preprocessor. Comparing object files is simply too prone to failure, and not necessary.

The basic idea is: Run the preprocessor on a file in multiple configurations, and compare the output files. If they're identical, then there's not much point to building that file in the different configurations, and it's a good candidate to be refactored to a different project, with fewer configurations.

In Visual Studio:

  1. Right-click on the Project, and go to Properties
  2. Under Configuration Properties -> C/C++ -> Preprocessor, edit "Generate Preprocessed File"
  3. Set it to Without Line Numbers (/EP /P)
  4. do a build in ConfigurationA
  5. the preprocessed files are generated in the same directory as the source file, and NOT in the Configuration Subdirectory, so move all the *.i files to the ConfigurationA subdirectory for safe keeping.
  6. Repeat steps 4 and 5 for ConfigurationB (and any other configurations)
  7. Compare the individual *.i files in each Configuration subdirectory. If a source file produces the same .i file in all configurations, it is a good candidate to be extracted to a different (single-configuration) library.
  8. Set Generate Preprocessed File back to its original setting.

However, if you're using precompiled headers, that will probably mess this up. The precompiled header may include things that are simply not needed by a given file, but cause the preprocessor output to change unecessarily.

E.g. SimpleFile.cpp and SimpleFile.h only use basic types, so they don't need to include anything at all. But, SimpleFile.cpp includes stdafx.h, because VisualStudio requires every file in a project to include the precompiled header. stdafx.h includes several files, including HighlyConfigurable.h -- which has several #ifdef statements, and behaves very different, depending on the configuration. Thus, since SimpleFile.cpp includes stdafx.h which includes HighlyConfigurable.h, the preprocessor output SimpleFile.i will be quite different for each configuration. Whereas, if stdafx.h were not used, SimpleFile.i would be the same in all configurations.

The simple workaround here is to comment out the entirety of stdafx.h. That may sound drastic, but you're not going to save the file that way - you're just going to follow the steps above to generate preprocessor files for comparison, and then restore stdafx.h to its former glory.

Tim
  • 8,912
  • 3
  • 39
  • 57