0

I wanted to know if there is ANY difference between how g++ compiles an already preprocessed .ii file, and compiling a .cpp file from scratch.

I am asking this because while building the binaries in my project with two step process (preprocessing followed by passing this file to g++) produces a different binary altogether(seen using objdump).

Preprocess command I am using -

/usr/bin/g++ -fdebug-prefix-map=/buildenv/cmake_build_dir/0=. -O3 -fPIC -g -fvar-tracking-assignments -march=haswell -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 -fconcepts -std=c++20 -Wno-invalid-offsetof -Werror=address -Werror=array-bounds -Werror=c++11-compat -Werror=char-subscripts -Werror=enum-compare -Werror=comment -Werror=format -Werror=main -Werror=maybe-uninitialized -Werror=missing-braces -Werror=nonnull -Werror=parentheses -Werror=reorder -Werror=return-type -Werror=sequence-point -Wstrict-aliasing -Werror=strict-overflow=1 -Werror=switch -Werror=trigraphs -Werror=uninitialized -Werror=unknown-pragmas -Werror=unused-label -Werror=unused-value -Werror=volatile-register-var -Werror=clobbered -Wmissing-field-initializers -Wtype-limits -Werror=uninitialized -Wunused-but-set-parameter -Werror=return-local-addr -fvisibility-inlines-hidden -DBOOST_BIND_GLOBAL_PLACEHOLDERS -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -DHAS_FMA_SUPPORT -I/buildenv/include -I/usr/include/python3.9 -MD -MT <filename.cpp>.o -MF <filename.cpp>.o.d -fdiagnostics-color -E <filename.cpp>.

The preprocessed file is then moved to a tmp directory by another step in my program. Say the final filename is filename.ii

After this I do the compilation using -

/usr/bin/g++ -fdebug-prefix-map=/buildenv/cmake_build_dir/0=. -O3 -fPIC -g -fvar-tracking-assignments -march=haswell -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 -fconcepts -std=c++17 -fdiagnostics-color -Wno-invalid-offsetof -Werror=address -Werror=array-bounds -Werror=c++11-compat -Werror=char-subscripts -Werror=enum-compare -Werror=comment -Werror=format -Werror=main -Werror=maybe-uninitialized -Werror=missing-braces -Werror=nonnull -Werror=parentheses -Werror=reorder -Werror=return-type -Werror=sequence-point -Wstrict-aliasing -Werror=strict-overflow=1 -Werror=switch -Werror=trigraphs -Werror=uninitialized -Werror=unknown-pragmas -Werror=unused-label -Werror=unused-value -Werror=volatile-register-var -Werror=clobbered -Wmissing-field-initializers -Wtype-limits -Werror=uninitialized -Wunused-but-set-parameter -Werror=return-local-addr -fvisibility-inlines-hidden -c -o <filename>.o <filename>.ii

I can also build the source file to object file directly using the command -

/usr/bin/g++ -fdebug-prefix-map=/buildenv/cmake_build_dir/0=. -O3 -fPIC -g -fvar-tracking-assignments -march=haswell -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 -fconcepts -std=c++17 -fdiagnostics-color -Wno-invalid-offsetof -Werror=address -Werror=array-bounds -Werror=c++11-compat -Werror=char-subscripts -Werror=enum-compare -Werror=comment -Werror=format -Werror=main -Werror=maybe-uninitialized -Werror=missing-braces -Werror=nonnull -Werror=parentheses -Werror=reorder -Werror=return-type -Werror=sequence-point -Wstrict-aliasing -Werror=strict-overflow=1 -Werror=switch -Werror=trigraphs -Werror=uninitialized -Werror=unknown-pragmas -Werror=unused-label -Werror=unused-value -Werror=volatile-register-var -Werror=clobbered -Wmissing-field-initializers -Wtype-limits -Werror=uninitialized -Wunused-but-set-parameter -Werror=return-local-addr -fvisibility-inlines-hidden -DBOOST_BIND_GLOBAL_PLACEHOLDERS -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -DHAS_FMA_SUPPORT  -I/buildenv/include -I/usr/include/python3.8 -c -o <filename.cpp>.o <filename.cpp>

I am using diff <(objdump -D <binary1>) <(objdump -D <binary2>) to get the difference in assembly. The differences are in the instructions being executed. The whole set of assembly instructions being created are different.

  • How do you preprocess the source file? How do you build the preprocessed file to an object file? How do you build the *source* file to an object file? Please show us the commands and all flags/options you use. – Some programmer dude Aug 05 '22 at 12:08
  • Also, what *are* the "differences" you see? How do you use the `objdump` command? Please show us that as well. – Some programmer dude Aug 05 '22 at 12:09
  • What does the *ii* file extension represent? – Thomas Matthews Aug 05 '22 at 14:43
  • 1
    ii file is for the preprocessed cpp file right ? – Hardik Aggarwal Aug 06 '22 at 15:04
  • @HardikAggarwal, with gcc, input files ending with `.ii` are treated as C++ code which should not be preprocessed [docs](https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html). – starball Aug 11 '22 at 01:23
  • I noticed when you generate your .ii, you use `-std=c++20`, but when you compile it, you use `-std=c++17`. Just the standard you tell g++ to use _can_ affect preprocessing: see [this](https://en.cppreference.com/w/cpp/feature_test). Try it out: make a cpp file that just defines a global if a C++20 language feature exists, such as `__cpp_aggregate_paren_init`. now compare the preprocessor output between `-std=c++17` and `-std=c++20`. I don't know if this is the cause, but it's within the realm of possibility. In particular, I can imagine standard libraries using these feature-test definitions. – starball Aug 11 '22 at 01:37
  • Can you try doing both your preprocessor and compile steps with the _same_ c++ standard and update your post with what changes in the results you see? – starball Aug 11 '22 at 01:38

1 Answers1

2

Note: for anyone wondering, with gcc, input files ending with .ii are treated as C++ code which should not be preprocessed (docs).

The C+ standard defines what should happen at each phase of translation. Any conforming C++ compiler is able to follow all rules defined by the standard.

If you instruct your compiler to separate out its preprocessing and compilation phases as you have done, the safest thing to do by default is to pass the compiler the same flags for the preprocessing and compilation phases. Warning and diagnostic formatting flags shouldn't matter, and you should only need to pass command-line compiler definitions for the preprocesser step, but anything else that can control the preprecessor/compiler's behaviour should probably be the same. Weird things can happen otherwise.

Notably, in the commands you are using, when you do separate preprocessing and compiling, you use -std=c++20 for preprocessing and -std=c++17 for compilation, and when you don't separate preprocessing and compiling, you use -std=c++17. This itself can result in significant changes between the two outcomes. Different C++ standards can have incompatibilities in the language and the standard libraries. The compiler will conform to standard feature test macros based on what standard you tell it to use. Standard library implementations (and any other code, such as boost, which you are using) can use these macros to vary their behaviour depending on the presence of certain language / standard library features.

You may also be interested to learn more about deterministic compilation / reproducible builds

Aside: If you'd like to see discussion on integrating such a step in a CMake project, see this question.

starball
  • 20,030
  • 7
  • 43
  • 238