0

having this:

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

typedef struct {
   int a, b;
} str_t;

int main (void) {

   str_t *abc = (str_t*)((((str_t*)0)->a)-offsetof(str_t,a));

   return 0;
}

I have tried to do the same that does this macro:

#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );}) 

The compiler didn't gave any specific error, but the resulting program crashed. Why does not crash the macro as well?

Herdsman
  • 799
  • 1
  • 7
  • 24
  • 1
    Did the compiler crash, or did your program crash? – Thomas Jager May 06 '20 at 20:04
  • 2
    What are you trying to do? `((str_t*)0)->a` is just asking for trouble. – Kevin May 06 '20 at 20:04
  • 1
    `((str_t*)0)->a` is accessing memory near `0`. The `container_of` macro never dereferences the `0` pointer, it only uses it for the types. – Thomas Jager May 06 '20 at 20:05
  • @ThomasJager How can it "use it" without dereferencing? It does not make sense – Herdsman May 06 '20 at 20:08
  • 1
    It's inside `typeof`. This happens at compile-time, the value of `((type *)0)->member` is never accessed. This construct is used to get the type of the member. – Thomas Jager May 06 '20 at 20:10
  • @ThomasJager of course my program crushed. How can gcc crashed? That can never happen, since gcc just notify about error, so wrong question – Herdsman May 06 '20 at 20:10
  • @Kevin I am trying to do the same macro `container_of` does, just look at the source. – Herdsman May 06 '20 at 20:11
  • @Herdsman "The compiler didn't gave any specific error, just crashed." This sentence states that it's the compiler that crashed. – Thomas Jager May 06 '20 at 20:11
  • 1
    Your program is dereferencing NULL pointer. The `typeof` is only taking the type of the field. It's a gray area, but defined well enough by your toolchain provided it has this macro. – Eugene Sh. May 06 '20 at 20:12
  • Yes, explicitly. But you usually say that, when error occurs and everyone knows it is program crash, never compiler crash – Herdsman May 06 '20 at 20:12
  • Here is the related discussion: https://stackoverflow.com/questions/57342141/does-this-implementation-of-offsetof-invoke-undefined-behavior – Eugene Sh. May 06 '20 at 20:14
  • 2
    Re "*But you usually say that,*", No, noone says the compiler crashed when they mean the program crashed. Fixed the question. (It is possible for the compiler to crash; it's just not likely for well-seasoned products like `gcc`.) – ikegami May 06 '20 at 20:14
  • @EugeneSh. Still don't get it. How can `__typeof__(x)` use 0 address without dereferencing it? For what other are addresses for? – Herdsman May 06 '20 at 20:16
  • It is not accessing any actual memory pointed by `0`, it just knows the type of a field of the given structure - on compiler level in compile time. Memory access (and "crash") is happening in runtime. – Eugene Sh. May 06 '20 at 20:17
  • can you please elaborate this statement? `it just knows the type of a field of the given structure - on compiler level in compile time` – Herdsman May 06 '20 at 20:19
  • 1
    What is not clear in it? Compiler knows the types of the objects it is defining. It does not need to run the program for that. When you have `int x` in your program and then `typeof(x)` elsewhere, you need not run the program to realize that `typeof(x)` is `int` – Eugene Sh. May 06 '20 at 20:22
  • @EugeneSh. yes, you indeed not need to realize that `typeof(x)` is int (basic type). But you also don't use null pointer in your case. In my case it is a little bit different, as you see. – Herdsman May 06 '20 at 20:25
  • 1
    @Herdsman That was simplified. Now consider `typeof((int*)0)`. Can you infer the type of the `typeof` argument? Sure you can. Now consider `struct s {int a; char b;}` and `typeof(((s*)0) -> a)` - can you infer the type? Of course. – Eugene Sh. May 06 '20 at 20:28
  • Of course I can, just not sure if compiler can as well. ((type *)0)->member designates the lvalue of the member member of the structure to which (type *)0 points. But ((type *)0) does not point to a structure, and therefore there is no member this can be the lvalue of. – Herdsman May 06 '20 at 20:31
  • 1
    So you can rest assured the compiler can - and this is why this macro works. But your question is not about the macro but about your code which is actually trying to access invalid memory location. – Eugene Sh. May 06 '20 at 20:33
  • Just read the rest of my last comment, because you suggest the the compiler will "infer", but `((type *)0) does not point to a structure, and therefore there is no member this can be the lvalue of` If compiler can "infer" lvalue of "nothing", then the compiler is surely better then humans are. – Herdsman May 06 '20 at 20:37
  • 1
    @Herdsman Then I will try an analogy. If it doesn't help, I'll let other people to continue. Can you tell what is the gender of Snow White? Sure you can tell even though there is no such a real person. Same in this case you are asking "what would be the type of the filed `a` of the structure `s` in case such a structure was located at address `0`. Instead of zero you can have any other arbitrary number and it will not change the outcome, it's just that `0` has a special meaning in C and can be casted to pointer type without triggering warnings. – Eugene Sh. May 06 '20 at 20:41
  • @EugeneSh. Ok, did not know that feature of gcc. I haven't read about `0` having special meaning for compiler, will try to search something about it – Herdsman May 06 '20 at 20:47
  • Note also that the compiler writer is permitted to use constructs, e.g. within system headers, that are unavailable to the user. For example, identifiers beginning with two underscores. Using parts of the compiler as examples of how to write standard-conforming code is Not Recommended. – mlp May 06 '20 at 22:09

1 Answers1

0

Your real question is really about the difference between

typeof( ((struct Foo*)0)->a )    // Relevant code from the macro.

and

int i = ((struct Foo*)0)->a;     // Relevant code from your program.

Let's start by using a valid pointer p instead of 0, and ask ourselves what the following two snippets do:

struct Foo s = { 0 };
struct Foo *p = &s;

typeof( p->a )

and

int i = p->a;

In the first case, we are trying to get the type of a member of a structure. The value of p is irrelevant; the compiler only needs its type. In fact, the result is computed during compilation, before p is allocated or assigned a value.

In the second case, we are trying to read memory. This memory will be found some location relative to the pointer in p. Different values of p will result in different memory locations being read.


So what happens when we use 0 instead of a valid pointer?

In the first case, we never had a valid pointer. Because typeof is evaluated during compilation, the pointer doesn't even exist when typeof is evaluated. So that means that the following could conceptually work fine:

typeof( ((struct Foo*)0)->a )

And this brings us to the second case.

int i = ((struct Foo*)0)->a;

0 means NULL when used a pointer, and may not be zero at all.

This tries to read memory some number of bytes after NULL. But NULL isn't an address; it's the lack thereof. The concept of reading the memory from an address relative to NULL is flawed, meaningless. Since the concept is meaningless, it can't possible work fine.


What does the standard say for typeof( ((struct Foo*)0)->a )?

I don't know.


What does the standard say for int i = ((struct Foo*)0)->a;?

The C language doesn't define what happens in that situation. We call this undefined behaviour. The compiler is free to do whatever it wants when it encounters it. Commonly, it results in a protection fault (which results in a SIGSEGV signal on unix systems).

$ gcc -Wall -Wextra -pedantic a.c -o a     # OP's program
a.c: In function ‘main’:
a.c:11:11: warning: unused variable ‘abc’ [-Wunused-variable]
    str_t *abc = (str_t*)((((str_t*)0)->a)-offsetof(str_t,a));
           ^~~

$ ./a
Segmentation fault (core dumped)
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • according to this https://stackoverflow.com/questions/57342141/does-this-implementation-of-offsetof-invoke-undefined-behavior, `((str_t*)0)->member` is not NULL – Herdsman May 06 '20 at 20:33
  • Please read again. The answer to the linked question says "*the behavior of `((size_t)&((type *)0)->member)` is not specified by the C standard*" just like I did. – ikegami May 06 '20 at 20:35
  • But there is difference between "not specified by the C standard" and "SIGSEGV", which is well specified. Now you state both. – Herdsman May 06 '20 at 20:39
  • Yes, I did state both what the standard says and what usually happens. – ikegami May 06 '20 at 20:41
  • Ok, then please explain why can macro `container_of` use `(type *)0)->member ` neither with `SIGSEGV` nor with `not specified by C lang.` error? – Herdsman May 06 '20 at 20:45
  • I never said it will result in a SIGSEGV. I never said it resulted in a protection fault. I only said it *usually* does the latter. I said the behaviour is undefined, which means the standard is silent on what should happen. As such, the compiler is free to do whatever it wants when it encounters it. I have extended my answer to clarify what "undefined behaviour" means. – ikegami May 06 '20 at 20:48
  • I have greatly extended my answer. – ikegami May 06 '20 at 21:35
  • How is possible, that `typeof` is evaluated in compilation time? Because it only needs a type, but the type and its alignment in memory (because that is what makes type - its memory representation - e.g. int 4bytes, char 1byte , member of struct nbytes, etc.) could not be resolve in compilation type because it is dependent on memory. It is sill a little magic for me the macro `typeof` and its implementation in compiler – Herdsman May 07 '20 at 10:00
  • You are mistaken. All those things are chosen by the compiler – ikegami May 07 '20 at 15:52
  • I know I should avoid discussion and will probably make another question on this topic, but I am not still sure, how compiler works with types. I always thought, types in c are memory thing and memory is NOT resolve in compilation time, but in link time, it would be great to add some details of how compiler innerly represents type or if does this job linker with their alignment. Or do you have any question on this topic here somewhere? just provide some links if so – Herdsman May 07 '20 at 17:00
  • The location in memory is the only thing decided at runtime. (Even that is decided at complies time for static objects. Well, link time to be precise.) The fact that an object is a `int`, the size of an `int`, the alignment of an `int`, etc are all known to the compiler. How else could it generate code that manipulates them? – ikegami May 07 '20 at 17:04
  • But if compiler in compilation time determine the byte-size of a arbitrary type, and I can make arbitrary type like `typedef struct` with arbitrary byte-size (depending on the members, which are or aren't known types), then how it does that? Before generating asm (in which time, it has to know types), are these information store in compiler symbol table? How does compiler do such thing? – Herdsman May 07 '20 at 17:36
  • Re "*I can make arbitrary type [...] with [...] members, which are or aren't known types*", No, you can't. Members of a struct must have known types. Say you had `struct { Type1 a; Type2 b; } Foo;`. In order to do `struct Foo s;`, the compiler needs to know the size and alignment restrictions of `Type1` and `Type2`. In order to generate the code for `s.b` or `p->b`, the compiler needs to know the offset of `b` into the struct, which means the compiler needs to know the size of `Type1`. It must also know the size of `Type2` As such, both `Type1` and `Type2` must be a known types. – ikegami May 07 '20 at 17:59
  • So if I want to build yet unknown type to compiler (say example: 15-byte-long, with different data on each byte, that represents arbitrary information), then how can I do that? How can I innovate new types, their behaviour, their representation/alignment into gcc compiler? – Herdsman May 07 '20 at 18:06
  • You can compose types (structs, arrays), but you can't otherwise create new types (e.g. a 24-bit integer if none of the integers provided by the compiler are 24-bits). If you want to store information into 15 bytes in a format of your choice, you can use `char a[15]`. (Well, on some systems; a `char` can be more than a byte.) You might do this when building a message that requires big-endian integers on a little-endian machine, for example. – ikegami May 07 '20 at 18:15