struct hack - zero sized array

Question

#include <iostream>
using namespace std;

struct node1{
    char b[3];
    int c[0];
};

struct node2{
    int c[0];
};

struct node3{
    char b[3];
};


int main() {

    cout << sizeof(node1) << endl;  // prints 4
    cout << sizeof(node2) << endl;  // prints 0
    cout << sizeof(node3) << endl;  // prints 3
}

My Question is why does the compiler allocate 0 bytes for int c[0] in node2 but allocate 1 byte for its when part of node1. I'm assuming that this 1 byte is the reason why sizeof(node1) returns 4 since without it (like in node3) its size is 3 or is that due to padding??

Also trying to understand that shouldn't node2 have enough space to hold a pointer to an array (which will be allocated in the further down in the code as part of the flexible array/struct hack?

@BaummitAugen: Well, I feel silly. CAN REPDOUCE: http://coliru.stacked-crooked.com/a/f1a26629a75b8d01 — Mooing Duck, Oct 20 '15 at 21:20
Hypothesis: `node1` has padding to ensure `node1` and `node1.c` are aligned on a 4-byte boundaries since `int` has a 4-byte alignment requirement. There is no special case for a 0-sized array, so `int c[0]` isn't ignored. — John Kugelman, Oct 20 '15 at 21:21
@MooingDuck The link you posted already contains the answer: This is not standard C++. — Baum mit Augen, Oct 20 '15 at 21:21
As zero length arrays are not a thing in C++, please tell us what dialect of C++ you are talking about. — Baum mit Augen, Oct 20 '15 at 21:22
What compiler are you using? MSVC give 1 for `sizeof(node2)`, but with a warning that 0 size array is non standard. So the answer is *you are invoking non specified behaviour* so result are implementation dependant! — Serge Ballesta, Oct 20 '15 at 21:27
@SergeBallesta Compiler is Apple LLVM 7.0. After I turned on pedantic warnings in XCode, I see warnings as "Zero size arrays are an extension" — Electrix, Oct 20 '15 at 21:41
One question per question please. This is not very specific. Are you sure you didn't mean to start a chat room conversation, Nikhil? — Lightness Races in Orbit, Oct 22 '15 at 09:50

score 2 · Answer 1 · answered Oct 20 '15 at 21:52

Yes, it's about padding/alignment. If you add __attribute__((__packed__)) to the end [useful when writing device drivers], you'll get 3 0 3 for your output.

If node1 had defined c[1], the size is 8 not 7, because the compiler will align c to an int boundary. With packed, sizeof would be 7

alain · Answer 2 · 2015-10-22T09:47:38.250

Yes, padding makes the difference. The reason why node1 has a padding byte, while node3 doesn't, lies in the typical usage of zero-length arrays.

Zero-length arrays are typically used with casting: You cast a larger, (possibly variable-sized) object to the struct containing the zero-length array. Then you access the "rest" of the large object using the zero-length array, which, for this purpose, has to be aligned properly. The padding byte is inserted before the zero-sized array, such that the ints are aligned. Since you can't do that with node3, no padding is needed.

Example:

struct Message {
   char Type[3];
   int Data[];    // it compiles without putting 0 explicitly
};

void ReceiveMessage(unsigned char* buffer, size_t length) {
    if(length < sizeof(Message))
        return;
    Message* msg = (Message*)buffer;
    if(!memcmp(msg->Type, "GET", 3)) {
        HandleGet(msg->Data, (length - sizeof(Message))/sizeof(int));
    } else if....

Note: this is rather hackish, but efficient.

score 0 · Answer 3 · edited Oct 20 '15 at 21:48

0

c doesn't allocate one byte in node1. Its because of the padding added to b.

For b, to be easily obtainable by a 32-bit CPU, it is four bytes big. 32-bit CPUs can read 4 consecutive bytes from memory at a time. To read three, they have to read four and then remove the one not necessary. Therefore, to optimize this behavior, the compiler padds the struct with some bytes.

You can observe similar compiler optimizations when values are pushed on the stack (that is, arguments or local variables are allocated). The stack is always kept aligned to the CPU's data bus size (commonly 32 or 64 bits).

edited Oct 20 '15 at 21:48

Mooing Duck

64,318
19
100
158

answered Oct 20 '15 at 21:22

cadaniluk

15,027
2
39
67

Well, it _is_ allocating one byte, but it is because the padding is required by `b`. – Mooing Duck Oct 20 '15 at 21:25
1

This answer seems to imply that `node3` should have padding, too, but it doesn't. – John Kugelman Oct 20 '15 at 21:25
@MooingDuck I think the "It" was referring to `c` (can't remember anymore :-)). Will clear that up. – cadaniluk Oct 20 '15 at 21:26
@JohnKugelman yes exactly, which is the purpose of adding node3 to this program – Electrix Oct 20 '15 at 21:27
@JohnKugelman Wait, why? If you have zero bytes then why do you need memory to be allocated at all? – cadaniluk Oct 20 '15 at 21:28
@JohnKugelman: No, only elements in a struct, which are larger than 1 will need padding for alignment purposes. So, a struct with `char x[3];` does not need alignment, but `char x[3]; int y;` would have padding to ensure `y` is aligned. – Mats Petersson Oct 20 '15 at 21:28
@cad: http://stackoverflow.com/questions/119123/why-isnt-sizeof-for-a-struct-equal-to-the-sum-of-sizeof-of-each-member – Mooing Duck Oct 20 '15 at 21:49
2

@MatsPetersson Yes, I understand that, but that's not what the answer says. The answer says `b` is four bytes big to be easily obtainable. That's not the reason: it's all about `c`. – John Kugelman Oct 20 '15 at 22:13
@JohnKugelman: Yeah, I saw the same problem with this answer. `b` is an array, so alignment isn't optimizing for loading all of it at once. Also, many CPUs do support unaligned access to 32bit words. It's only slower if they cross a cache-line boundary. It's a fair point though; either the compiler has to emit code to do aligned loads and combine them, or the CPU internals have to do it under the hood. – Peter Cordes Oct 20 '15 at 22:44

score 0 · Answer 4 · answered Oct 20 '15 at 23:45

int main() {

  cout << sizeof(node1) << endl;  // prints 4
  cout << sizeof(node2) << endl;  // prints 0
  cout << sizeof(node3) << endl;  // prints 3
}

the main function queries the the size of the user defined structs, not of the array members. sizeof() will return the number of bytes allocated to the struct, with each character allocated in the character array being allocated 1 byte. A character array is really a C style string which is terminated by the sentinel character '\0'. It is likely to include the byte allocated to hold the sentinel character when evaluating the sizeof(node1) as there is another variable after it so it reads over it, but not include the sentinel in sizeof(node3) where the string and the struct terminates

Sentinel characters have nothing to do with this behavior. – aschepler Oct 21 '15 at 00:34 — aschepler, Oct 21 '15 at 00:34

struct hack - zero sized array

4 Answers4

Linked