3
int main[-1U] = {1};

It's very simple. Compile it with

gcc foo.c

No file will be generated (at least I can't find it with lsof when I hang GCC in the background gcc & :), yet 16 GiB of secondary storage (hard drive) will be eaten.

How does this work?

iBug
  • 35,554
  • 7
  • 89
  • 134
  • @Bob__ GCC defaults to `-O0`. – iBug Nov 24 '17 at 15:22
  • 1
    It just created an `a.out` file of 16 GiB on my hard drive. You cannot find the file with `lsof` because `of` stands for "open files". The file is not open any more after `gcc` completes. – axiac Nov 24 '17 at 15:26
  • @axiac I run `lsof` with gcc in the background. – iBug Nov 24 '17 at 15:31
  • 1
    It doesn't matter, @ibug. The source is so short that `gcc` will compile it before you have a chance to see anything. "In the background" is not the same thing as "suspended". – John Bollinger Nov 24 '17 at 15:35
  • @JohnBollinger Shouldn't I be able to observe it while it's generating `a.out`? – iBug Nov 24 '17 at 15:37
  • You could use `ltrace` to see the system call that generates the file. Then you'll see why it is next to impossible to "observe" it's creation. – MrPaulch Nov 24 '17 at 15:38
  • @MrPaulch I see `vfork` and `waitpid`, what's wrong? – iBug Nov 24 '17 at 15:45
  • @MrPaulch Do you mean something like `as` or `ld` is actually generating that file instead of `gcc`? – iBug Nov 24 '17 at 15:48
  • If you can catch it while it's actually running then yes, @iBug. I had supposed that that would be difficult -- even with such a large static array -- but maybe not. In that case, though, `lsof` might still be tricky to use for the purpose because you don't necessarily know either which process you're looking for or what file name. The actual process writing the file is probably not `gcc` itself, and the file is not necessarily named `a.out` or any other predictable name while it is being written. – John Bollinger Nov 24 '17 at 15:56
  • @JohnBollinger I caught `/bin/as` and `/tmp/abcdefgh o`. That's probably an answer. – iBug Nov 24 '17 at 15:57
  • Yes. The actual writing of the array is done by the assembler `/bin/as` into a temporary object file. After that the linker will do it's thing creating the `a.out`. But it's smart enough not to make a byte copy of the `.o` file. All in all `a.out` will be open for a few milliseconds. – MrPaulch Nov 24 '17 at 16:42

5 Answers5

7

You can solve this mystery with a simple printf:

printf("%zu\n", (size_t)-1U);

This produces 4294967295 (demo) - enough to fill 16Gb of memory on systems where sizeof(int) is equal to 4.

To see why this takes so much space in the file system, compile foo to an object file, and run size utility.

I modified the program to size the array to 1,000,000 elements. Here is the output that you get from running size:

$ gcc -c foo.c
$ size -A -d foo.o
foo.o  :
section        size   addr
__text            0      0
__data      4000000      0
Total       4000000

__data segment contains initialized data. Compiler fills it in because you supplied {1} initializer. If you omit it, the size of the executable on disk would shrink to a few kilobytes, because the array would be placed into uninitialized segment:

$ size -A -d foo.o
foo.o  :
section     size   addr
__text         0      0
Total          0
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
4

Your code is wrong (undefined behavior) since main should be a function (not an array). You should be scared, and it could have been worse.

But I guess that -1U is actually 4294967295 on your machine.

You want GCC to fill a four billion element array, don't be surprised that GCC needs a lot of memory for that (it needs to reserve some representation of that monster at compile time).

The generated executable has four billion integers of four bytes. So it needs 16Gbytes. And yes, it takes a long time to compile that. Finally the linking phase fails:

ibug.c:1:5: warning: ‘main’ is usually a function [-Wmain]
 int main[-1U] = {1};
     ^~~~
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xb): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in ibug
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in ibug
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0xba): relocation truncated to fit: R_X86_64_PC32 against `.bss'
collect2: error: ld returned 1 exit status

That ld process needed 16 gigabytes of (virtual) memory as observed with top. I guess you experimented thrashing. The entire gcc command took 3:14 minutes on my desktop (most of which is that ld process started by it). Since ld failed the monster files it has created have been removed by the gcc driving program.

and I have 32 gigabytes of RAM on my machine (Linux/Debian/Sid/x86-64, GCC7, compiling with gcc -Wall ibug.c -o ibug)

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Note it's **secondary storage** (i.e. hard drive), not RAM. – iBug Nov 24 '17 at 15:21
  • 2
    It's a static *initialized* array of 4G ints. Assuming each int is 4 bytes, you are ending up with 16G binary. Of course the initialization of the `0` elements can be optimized out, but not sure the compiler will be smart enough to do that. – Eugene Sh. Nov 24 '17 at 15:22
4

Unsigned arithmetic is modulo one more than the maximum value of the given type.

Consequently -1U is -1 + MAX_UINT + 1, which will be 2^32-1 on an architecture where unsigned == uint32_t. If your sizeof(int) == 4, int x[-1U] = {1}; will need 4*(2^32-1) B =~ 16 GiB of storage and the storage isn't being elided because it's not all-zeros.

The fact that your x is main plays no role (except that its being main will allow the linker to complete the linking, even though that will lead to a corrupt executable as your main isn't a function).

int x[-1U] = {1};
int main(){}

will require over 16GiB just as well, and this example should be perfectly defined (as long as your implementation can handle an array that big), unlike your example where main has a type it shouldn't have.

iBug
  • 35,554
  • 7
  • 89
  • 134
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • Does that program just create an array containing `2^32` elements of type `int` ? – Hollyol Nov 24 '17 at 15:34
  • 2
    That's about right, @Hollyol, but it would be clearer to say that the program *contains* an array of (actually 2^32 -1) `int`s. The array has static duration, so it exists already when the program starts. And the whole thing will appear as data in the program image because one of its elements is initialized to non-zero. – John Bollinger Nov 24 '17 at 15:39
  • 1
    @Hollyol Assuming a common implementation (cuz the wrongly typed main should technically make the program undefined), it will simply cause a successfully loaded program to a have a global array of 2^32 ints (`int main[-1U];`). The program will segfault once the startup code tries to call this `main` as a function. – Petr Skocik Nov 24 '17 at 15:46
1

Uninitialized statically allocated arrays are usually go to the .bss section. They are implicitly initialized to zeroes. This data is not required to be stored inside the binary, because the startup code "knows" it should initialize this section to all zeroes when run. So we can say that uninitialized arrays, even when statically allocated are not consuming the binary size.

On the other hand the initialized data (non-zero initialized), as in your case is going to the .data section. Which is a part of the resulting binary file, because the initialization data has to be stored somewhere. In your case you have sizeof(int) * UINT_MAX (-1U = UINT_MAX) of initialized data (presumably equating to 16Gb), which all add to the resulting binary.

Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
0

Perhaps, you have virtual memory enabled on your system. So when you are trying to allocated 16 GB of data in memory, a file of that much size appears on the hard disk because of virtual memory.

16 GB because you are allocating 4294967295 number of ints of size 4 each.

NoOne
  • 313
  • 1
  • 4
  • 11