C: how does dot/arrow operator work under the hood?

Question

From what I understand, given

int *x = malloc(10 * sizeof(int));
x[5] = 13;

malloc just allocates empty space (with no assumption about the object that will be put there), and x[5] translates to *(x + 5) which is treated as an integer. So, it is left to the [] operator to create the illusion of an array.

But what happens in the following case?

struct test {
    int a;
    char b;
};

struct test* x = malloc(sizeof(struct test));
x->a = 3;
x->b = 'a';

Do the x->a, x->b translate to some memory position in some regular way, like the [i] operator does? Does the C reference state anything, or is it implementation specific? I've been looking through various books, but, contrary to arrays, structs are always presented as black box.

Are you asking about how the struct members are laid out in memory? — , Jun 18 '19 at 14:14
Propably the linker either replaces the lines by the base adress of the `struct` plus the offset or with the direct adress of the variable inside of the structure. — Yastanub, Jun 18 '19 at 14:14
*"Do the x->a, x->b translate to some memory position in some regular way"* Exactly. It knows the offsets of `a` and `b`, and when you access those members, those offsets are "hardcoded" in the machine code. — Blaze, Jun 18 '19 at 14:15
The basic answer is, "yes". `x->a` basically means, "take the pointer `x`, add the offset of field `a`, and dereference the new pointer as per the type of field `a`." (It's a little more complicated than that, because `x->a` is actually shorthand for `(*x).a`, but it works out just about the same in the end.) — Steve Summit, Jun 18 '19 at 14:15
They are called Pointers. Check the Link for Details https://www.tutorialspoint.com/cprogramming/c_pointers.htm — Sayed Muhammad Idrees, Jun 18 '19 at 14:15
Each member of a struct has an offset from the beginning of the struct. The compiler knows those offsets, and it ends up being very similar to indexing an array, yes — Shawn, Jun 18 '19 at 14:15
I have often thought of structures as being a generalization of arrays, where the fields are not all of the same type. (Also you access the fields by name rather than numeric index. Also the name you use when accessing one has to be a compile-time constant.) — Steve Summit, Jun 18 '19 at 14:19
@Blaze: thanks, I suspected that. Do you know of any resource that describes it (reference or something less official)? — blue_note, Jun 18 '19 at 14:22
You might like to look at the macro `offsetof` to be found in `#include `. — Weather Vane, Jun 18 '19 at 14:23
Err, that's not a duplicate at all. Voting to re-open this, as I had already typed out a long answer. If you have a good dupe by all means close it, but I don't think this is one. — Lundin, Jun 18 '19 at 14:34

score 2 · Answer 1 · answered Jun 18 '19 at 14:20

2

Let's say an int is 4 bytes and a char is 1 byte (I don't know those numbers by heart but let's say this is correct). Then the struct test would be 5 consecutive bytes in the memory (first a (4 bytes) and then b (1 byte)).

If you then call test->b, then you are pointing to the start of that struct plus an offset of 4 bytes. (since test is a pointer, ->a kind of means +0 and ->b kind of means +4)

answered Jun 18 '19 at 14:20

wohe1

755
7
26

1

https://fresh2refresh.com/c-programming/c-struct-memory-allocation/ but the point I was trying to make is that the _offset_ is known – wohe1 Jun 18 '19 at 14:34

score 1 · Accepted Answer · answered Jun 18 '19 at 14:34

malloc just allocates empty space (with no assumption about the object that will be put there)

Correct. Dynamically allocated memory specifically, has no type until the point where you write something to that area. Formally the C language calls this the effective type. The formal definition is found in C17 6.5/7:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

What's returned from malloc is just a raw chunk of memory, with no special attributes, until the point where you write to that area. After which the compiler has to put a "type label" on it internally. As soon as you access it by using [], the compiler will have to assume that the data allocated has to be treated as an array, to keep the type system consistent between statically allocated and dynamically allocated objects.

Similarly, the memory area becomes a struct at the point when you access the memory, as it will have padding etc and dictate the memory offset of each member. So if given a struct with opposite order of your example, like this:

struct test {
    char a;
    int  b;
};

Then it is implementation-defined if x->b will result in access to byte 1, byte 4 or something else, since the compiler is free to add padding between the members.

But as soon as you access x->something, the compiler will have to start regarding whatever x points at as effective type struct test, or the type system wouldn't behave consistently.

C: how does dot/arrow operator work under the hood?

2 Answers2