The difference is that the compiler already "knows" the offset at the compile time and doesn't need to compute it, therefore no memory access is needed and no segfault occurs. That is why offsetof
would not work with an opaque struct. This becomes particularly clear when you inspect a corresponding x86_64
assembly code. When I ran gcc -S
for the following C code:
#include <stdio.h>
typedef struct {
int first;
int second;
int third;
}group;
int main(){
group a;
size_t offset = (size_t) &(((group*)0)->second); # notice the cast to avoid a warning
return 0;
}
there were basically only two instructions corresponding to the meat of my C program:
movq $4, -24(%rbp) # move literal value 4 to *(rbp-24)
movl $0, %eax # move literal value 0 to eax (this is just a part of "return 0;" statement)
If I were now to change the last two lines in the C to:
size_t offset = (size_t) &(((group*)0)->third);
return 1;
The assembly code would only differ in those two instructions. They would then read:
movq $8, -24(%rbp)
movl $1, %eax
The 4 and 8 are there because on my machine int
is equal to 4 bytes. More importantly, it is known what the members of your struct are (that’s why an opaque struct wouldn’t work - this information is hidden.) Since the compiler (or assembler) has this information available from the start it can and it does just "hardcode" it. It doesn't do any dereferencing, because it does not need to.
If I now add the problematic line to my C code:
#include <stdio.h>
typedef struct {
int first;
int second;
int third;
}group;
int main(){
group a;
size_t offset = (size_t) &(((group*)0)->third);
int val = ((group*)0)->second;
return 0;
}
and assemble it, I get the following additional instructions:
movl $0, %eax # move literal value 0 to eax
movl 4(%rax), %eax # dereference the value at *(rax + 4) and save it in eax
movl %eax, -28(%rbp) # move the value saved at eax to the *(rbp - 28)
The first line just stores literal value of 0 in the lower half of the rax
register (the upper half is zeroed anyway). Segfault is triggered in the next instruction, when memory is dereferenced at the location rax + 4 = 4
in an attempt to store the obtained value to the eax
register. In fact, here you can see again that compiler just knows the offset of the struct group
member second
by how it simply offsets the location of the struct (saved in rax
) by a literal value of 4. It just so happens that this is not a valid memory, and hence the OS terminates your program by sending it SIGSEGV
.
As said in the comments, in the first example you're not dereferencing anything, but only calculating an address. In the second case you're actually dereferencing a pointer to 0 which leads to a segfault. It's all there in the article you've linked yourself:
Now that the struct offset is “normalized”, we don’t even care about the size of the green member or the size of the structure because it’s easy the absolute offset is the same with relative offset. This is exactly what &((TYPE *)0)->MEMBER does. This code dereferences the struct to the zero offset of the memory.
This generally is not a clever thing to do, but in this case this code is not executed or evaluated. It’s just a trick like the one I’ve shown above with the tape measure. The offsetof() macro will just return the offset of the member compared to zero. It’s just a number and you don’t access this memory. Therefore, doing this trick the only thing you need to know is the type of the structure.
(My emphasis.)