Object file is 2.5x larger on linux than on macOS or Windows

Question

I have a file which, when compiled to object file, has the following size:

On Windows, using MSVC, it's 8MB.
On macOS, using clang, it's 8MB.
On linux (Ubuntu 18.04 or Gentoo), using either gcc or clang, it's 20MB.

The file (detailed below) is a representation of (a part of) a unicode table along with character properties. The encoding is utf8.

It occured to me that the problem might be that libstdc++ can't handle the file well, so I tried libc++ with clang on Gentoo, but it didn't do anything (the object file size remained the same).

Then I thought that it might be some optimization doing something odd, but once again I had no size improvements when I went from -O3 to -O0.

The file, on line 50 includes UnicodeTable.inc. The UnicodeTable.inc contains a std::array of the unicode codepoints.

I tried changing std::array to C style array, but again, the object file size did not change.

I have the preprocessed version of the CodePoint.cpp which can be compiled with $CC -xc++ CodePoint.i -c -o CodePoint.o. CodePoint.i contains about 40k lines of STL code and about 130k lines of unicode table.

I tried uploading the preprocessed CodePoint.i to gists.github.com and to paste.pound-python.org, but both refused the 170k lines long file.

At this point I'm out of ideas and would greatly appreciate any help regarding finding out the source of the "bloated" object file size.

I assume you are using the .o file as part of a shared library or executable at some point -- is that executable also 12MB larger under Linux? Even after you run `strip` on the executable? — Jeremy Friesner, Jun 27 '18 at 13:51
How big are the executables? Since object files are temporary, is the size important? — Thomas Matthews, Jun 27 '18 at 14:11
This question may help with analyzing the result further: https://stackoverflow.com/questions/11720340/tool-to-analyze-size-of-elf-sections-and-symbol — PaulR, Jun 27 '18 at 14:15
Sounds like `-fmerge-constants` isn't happening on your Linux platform. — Eljay, Jun 27 '18 at 17:52
@JeremyFriesner The shared library is 12MB larger on linux as well. That's how I found out that the object file is so large. — bstaletic, Jun 28 '18 at 11:09
The full command line looks like the following: `/usr/bin/c++ -DUSE_CLANG_COMPLETER -DYCMD_CORE_VERSION=39 -DYCM_EXPORT="" -Dycm_core_EXPORTS -I/home/bstaletic/Temp/ycmd/cpp/ycm -I/home/bstaletic/Temp/ycmd/cpp/ycm/ClangCompleter -isystem /home/bstaletic/Temp/ycmd/cpp/BoostParts -isystem /home/bstaletic/Temp/ycmd/cpp/pybind11 -isystem /usr/include/python3.6m -isystem /home/bstaletic/Temp/ycmd/cpp/llvm/include -fvisibility=hidden -O3 -DNDEBUG -fPIC -std=c++11 -o CMakeFiles/ycm_core.dir/CodePoint.cpp.o -c /home/bstaletic/Temp/ycmd/cpp/ycm/CodePoint.cpp` — bstaletic, Jun 28 '18 at 11:09
@Eljay Manually passing `-fmerge-constants` didn't improve the size at all. — bstaletic, Jun 28 '18 at 11:11
What do the object dumps of CodePoint.cpp.o look like on the 3 platforms? That should give you some insight as to what is going on that is different on Linux. — Eljay, Jun 28 '18 at 11:24
Sorry, I have not used `objdump` before. Would you mind telling me what to look out for? Should I just get the output of `objdump --full-contents CodePoint.o`? — bstaletic, Jun 28 '18 at 11:40
@PaulR I was only now able to try the `nm` and `size` from that other stackoverflow thread. `nm` on macOS just said that `sizes with -print-size for Mach-O files are always zero` while `size` didn't report anything close to 20MB. — bstaletic, Jun 28 '18 at 21:14
Here's what happens when I do `size -A -d ycm_core.so` (which is the name of the shared object on both linux and mac): https://gist.github.com/bstaletic/360197714ad7ceb6df7a14589d698e50 — bstaletic, Jun 28 '18 at 21:50

PaulR · Accepted Answer · 2018-06-29T10:22:22.603

From the output of size you linked you can see that there are 12 MB of relocations in the elf object (section .rela.dyn). If a 64 bit relocation takes 24 bytes and you have 132624 table entries with 4 pointers to strings each, this pretty much explains the 12 MB difference (132624 *4 * 24 = 12731904 ~ 12 MB ).

Apparently the other formats either use a more efficient relocation type or link the references directly and just relocate the whole block together with the strings as one piece of memory.

Since you are linking this to a shared library the dynamic relocations will not go away.

I am not sure if it is possible to avoid this with the code you currently use. However, I think a unicode code point must have a maximal size. Why don't you store the code points by value in char arrays in the RawCodePoint struct? The size of each code point string should be no larger than the pointer you currently store, and the locality of reference of the table lookup may actually improve.

constexpr size_t MAX_CP_SIZE = 4; // Check if that is correct

struct RawCodePointLocal {
  const std::array<char, MAX_CP_SIZE> original;
  const std::array<char, MAX_CP_SIZE> normal;
  const std::array<char, MAX_CP_SIZE> folded_case;
  const std::array<char, MAX_CP_SIZE> swapped_case;
  bool is_letter;
  bool is_punctuation;
  bool is_uppercase;
  uint8_t break_property;
  uint8_t combining_class;
};

This way you should not need relocations for the entries.

Thanks for the help! That surely pointed me in the right direction. Replacing `char*` with (appropriately sized) `char[]` made `.rela.dyn` drop to negligable size, but then MSVC took forever to compile (however, that's another issue). — bstaletic, Jun 30 '18 at 05:52

Object file is 2.5x larger on linux than on macOS or Windows

1 Answers1