0

I've just been reading up on type punning, strict aliasing and alignment and was having trouble understanding pointer alignment issues. I know that there are exceptions when using char* and was wondering if the following was safe to do:

char* test = malloc(sizeof(struct foo));
((struct foo*)test)->member = 10;

Apparently the safest way to do something like this would be to use memcpy:

char* test = malloc(sizeof(struct foo));
struct foo A;
A.member = 10;
memcpy(test, &A, sizeof A);

But then what if you wanted to access/change those values:

((struct foo*)test)->member = 10;

If this wasn't safe to do due to pointer alignment then you would have to do 2 memcpys involving a tmp variable?

struct foo tmp;
memcpy(&tmp, test, sizeof tmp);
tmp.member = 20;
memcpy(test, &tmp, sizeof tmp);

Surely this is more inefficient than a pointer cast? I'm not sure if all compilers would optimize out the memcpys.

For those wondering as to why you would want to do this, I was thinking that maybe you might want to store a char* to a value of any type (yes I know you can use a union for this, but what if this was abstracted away so you would not be able to place your own types in said union).

And what about in the case that the char* referred to an array of struct foo.

for (size_t i = 0; i < size; ++i)
{
  ((struct foo*)test)[i].member = 10;
}

as opposed to:

for (size_t i = 0, j = 0; i < size; ++i, j += sizeof (struct foo))
{
  struct foo tmp;
  memcpy(&tmp, test + j, sizeof tmp);
  tmp.member = 20;
  memcpy(test + j, &tmp, sizeof tmp);
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
name
  • 181
  • 1
  • 12
  • There surely are bigger issues hidden behind your code. Alignment being the first to come to mind. – iBug Jul 08 '21 at 21:20
  • 3
    The pointer returned by `malloc()` is always sufficiently well aligned to be used for any purpose (including arrays). (This does not apply to general `char *` values.) You could improve `char* test = malloc(sizeof(struct foo)); ((struct foo*)test)->member = 10;` by using `struct foo *test = malloc(sizeof(*test)); test->member = 10;` but there isn't a significant difference. See [§7.22.3 Memory management functions ¶1](http://port70.net/~nsz/c/c11/n1570.html#7.22.3p1) for more details. (Don't forget to error check the allocation — I assume it is omitted for compactness.) – Jonathan Leffler Jul 08 '21 at 21:22
  • 1
    See also [What is the strict aliasing rule?](https://stackoverflow.com/q/98650/15168) and links leading from it. Consider whether `void *` is a better choice than `char *`. – Jonathan Leffler Jul 08 '21 at 21:28
  • @iBug If the `char *` is coming from `malloc`, it is guaranteed to be aligned for most purposes (e.g. 8 byte aligned, at least), so it works for common types such as: `double`. Otherwise, `double *f = malloc(sizeof(*f));` would be inherently broken. As OP is using it, doing `char *test = malloc(...);` doesn't affect that alignment (i.e. `malloc` provides the same alignment regardless of what the type of the pointer variable caller uses). – Craig Estey Jul 08 '21 at 21:30
  • @JonathanLeffler Could you clarify what you mean by "general `char *` values". Also I didn't use `struct foo *test = malloc(sizeof(*test)); test->member = 10;` as explained in the case where if you would want to store a pointer to an arbitrary value. – name Jul 08 '21 at 21:30
  • @JonathanLeffler I also thought that `char*` were exceptions to the strict aliasing rule? Also you cannot perform pointer math on `void*` so I used `char*` in case you wanted to have a dynamic array of arbitrary type and stored it using a `char*` to the first and last element using `end - start` pointer math to get its size. – name Jul 08 '21 at 21:32
  • If you have `char data[32];`, then at least one of `char *test1 = &data[1];` and `char *test2 = &data[2];` will be aligned on an odd byte address, which makes the `char *` address in either `test1` or `test2` inappropriate for general use — you need the extra alignment promises made by (on behalf of) `malloc()` et al for the result of converting the `void *` the allocation functions return into a `char *`, and then converting the `char *` into a `something_else *` (e.g. `struct foo *`). – Jonathan Leffler Jul 08 '21 at 21:34
  • @JonathanLeffler Ah, I see. Thanks for clearing that up. So in the example I gave, a simple cast should be fine? Assuming that `char*` are safe from the strict aliasing rule that is. – name Jul 08 '21 at 21:36
  • 1
    Correct: you cannot do arithmetic on a `void *` according to Standard C, even though GCC does allow it. I think GCC is flexible about it in part because of [§6.2.5 Types ¶28](http://port70.net/~nsz/c/c11/n1570.html#6.2.5p28) — _A pointer to void shall have the same representation and alignment requirements as a pointer to a character type_ (and footnote 48 which says _The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions_). – Jonathan Leffler Jul 08 '21 at 21:38
  • Strict aliasing is defined in [§6.5 Expressions ¶7](http://port70.net/~nsz/c/c11/n1570.html#6.5p7). It's a bit long to quote, but the list of acceptable mechanisms for accessing an object includes "a character type". – Jonathan Leffler Jul 08 '21 at 21:44
  • I went down the strict aliasing rabbit hole once, still a bit confused when I came out. From what I remember, `char*` can safely alias anything, but that's not a two-way street. One solution I saw (linked above), was to put all the pointers you want to convert between in a `union`, and then use whatever type you want. However, I think that gets into "accessing a union member that wasn't last accessed is UB" territory, so still fuzzy on that (although I think practically all compilers have extensions that allow you to safely do this). – yano Jul 08 '21 at 21:44
  • @yano `"accessing a union member that wasn't last accessed is UB"` I think that only applies to cpp. I think that is fine to do in c (but the examples do not involve unions of pointers, if that's what you meant). But as stated in the question I am well aware you can use a union, but there are cases when you would not want to use one. – name Jul 08 '21 at 21:46
  • Earlier, I [said](https://stackoverflow.com/q/68308742/15168#comment120726051_68308742) "at least one … will be aligned on an odd byte address". That's true, but "exactly one" is also true. More significantly, on some machines, especially RISC machines, it's likely that both the `test1` and `test2` pointers will not be adequately aligned for accessing larger types such as `long`, `long long`, `double`, `long double` (and usually just `int` too). Sometimes, the o/s will handle the misaligned access, but doing so is expensive (DEC Alpha and OSF/1, for those who were around long enough ago). – Jonathan Leffler Jul 08 '21 at 21:51
  • according to [this answer](https://stackoverflow.com/a/252568/3476780) it's UB in C, but that's from '08, maybe things have changed in C11. – yano Jul 08 '21 at 21:54
  • [This question](https://stackoverflow.com/questions/2310483/purpose-of-unions-in-c-and-c) also says that from C99 and onward, it's legal to do so but can still be UB if the value read is invalid (not sure what constitutes an invalid value), and then in the next sentence says it can be implementation defined. I kind of like the top answer: _The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it._ So as before, my take away from this and strict aliasing is a shrug, hope you can understand better than me. – yano Jul 08 '21 at 22:01
  • 1
    @yano According to this [link](https://stackoverflow.com/a/11996970/9642458) its defined in C11 :D – name Jul 08 '21 at 22:01
  • 1
    @yana I see... Well this rabbit hole has certainly been a struggle for me haha. – name Jul 08 '21 at 22:03
  • 2
    No where else seems to mention `memcpy` being a problem, so I guess I'll have to stick to that. According to [this](https://godbolt.org/z/b9szcjM6j) the compiled code for `MSVC` (what I use) seems to be the same with pointer casts and `memcpy`. – name Jul 08 '21 at 22:05
  • @JonathanLeffler: `malloc` always provides memory with fundamental alignment, not for any purpose. If you need memory with a stricter requirement, such as an extended alignment requirement or a page boundary, you cannot rely on `malloc` absent guarantees beyond the C standard. – Eric Postpischil Jul 08 '21 at 23:36
  • @EricPostpischil — Yeah. You’re right — but that’s why I link to a (draft of) the standard so you can see the exact wording. – Jonathan Leffler Jul 08 '21 at 23:46

1 Answers1

0

malloc is intended to be used to reserve memory and create new objects in that memory. This code is fully defined by the C standard (supposing the appropriate definitions and #include lines precede it, of course):

void *test = malloc(sizeof (struct foo));
((struct foo *) test)->member = 10;

There is no aliasing problem here because the reserved memory initially has no object type associated with it. Once a value is stored into it using any type other than a character type, the type of that object becomes the effective type for the memory. The address returned by malloc is suitable for all the basic types (char, integer types, floating-point types), enumerator types, pointer types, arrays of these types, structures or unions of these types without an overriding alignment specifier, and all the complete object types in the standard library (like struct tm).

Pedantically, there are some inadequacies in the standard. The assignment above only writes to a member of a structure, not the whole structure, so is the effective type of the allocated memory set to be that of the whole structure? The language in the standard does not formally cover this and similar issues. However, it is entirely clear this code is intended to work, and all compilers support it.

Similarly, there are some inadequacies in the standard regarding pointer conversions, which is why I changed your code to use void *test instead of char *test. The standard does not explicitly said that converting the void * result of malloc to some other type, like struct foo *, actually produces a pointer to the same memory. (It just says that, if we convert the pointer back, we will get something equal to the original.) But it is clear memory reserved by malloc is intended to be used in this way. Inserting another conversion, first to char * and then to struct foo * complicates the matter further. Again, it should work, and it will work in all compilers, but the standard does not say it explicitly.

You can use memcpy to read or write objects in the allocated memory, but it is not necessary, unless you actually want to do aliasing where you have written an object of one type and want to reinterpret its bytes as another type. (This can be done with memcpy or with a union.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • This is a clear answer thanks! Do you mind clarifying what you mean by `structures or unions of these types without an overriding alignment specifier`. Are you referring to how some compilers may pad structures out? Also the reason why I chose to use `char*` instead of `void*` is in the scenario that you wish to do pointer math. For example, in a dynamic array that can store an arbitrary type, you can use a `char*` that points to the start and end of the array and do `end - start` to get its size in bytes. – name Jul 09 '21 at 01:36
  • Also regarding the last part about `memcpy` to reinterpret bytes. Does this work with types of different sizes? For example `struct foo { int* a; int b; int c; }` and `struct boo { int* a; int b; }` or even just to `int*`. Or would just casting a `struct foo*` to `int*` be suffice? (assume malloc was used to assign the pointer). I'm not going to ask about union casts since it seems this is a GNU extension ? Atleast `(union{ struct foo; struct boo; })(foo_ptr)` is I think. And I believe the pointer version of the union cast (as well as just pointer casting) breaks strict aliasing rules. – name Jul 09 '21 at 01:48
  • @name: An alignment specifier is `_Alignas (type-name)` or `_Alignas (constant-expression)`. These can be used to declare objects with unusual alignment requirements. – Eric Postpischil Jul 09 '21 at 02:24
  • @name: I do not know what you want to do with objects of different sizes. Reinterpreting the bytes of objects should not be done by casting pointers to them. The ways supported by the C standard to reinterpret bytes are to copy the bytes into a new object or to use a union. This is not a GCC extension. – Eric Postpischil Jul 09 '21 at 02:25
  • I was thinking of using it to abstract or hide some members of a struct, for example in my `foo` `boo` case a user can see `a` and `b` on `boo` but not `c`. Another example could be to give a header to a buffer containing some information. You could `char* test = malloc(sizeof(header) + sizeof(item) * numitems)` and store the `struct header` at `test` and use `item* arrItems = test + sizeof(header)` to access the items (`char*` used again for pointer math). – name Jul 09 '21 at 02:35
  • Regarding union cast as a GNU extension I read it [here](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type%2Dpunning). And [this](https://stackoverflow.com/a/25672839/9642458) mentioned `GNU extensions to standard C++ (and to C90) do explicitly allow type-punning with unions.` – name Jul 09 '21 at 02:38