3

The ISO C90 Standard (or at least the draft of it that I have) says this about malloc and alignment:

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated...

But can you use the same pointer returned by malloc for two different types? For example, suppose that I know that sizeof(int) <= 2 * sizeof(short). Could I allocate enough memory for 5 shorts, and use the first two as an int, i.e. is the following code guaranteed to work as intended?

#include <stdio.h>
#include <stdlib.h>
int main(void) {
    void* data = malloc(5 * sizeof(short));
    short* short_array = data;
    int* int_ptr = data;
    if (!data) return EXIT_FAILURE;
    *int_ptr = 13943;
    short_array += 2; /* Skip over the int */
    short_array[0] = 7;
    short_array[1] = 238;
    short_array[2] = -123;
    printf("%d %d %d %d\n", *int_ptr, short_array[0], short_array[1], short_array[2]);
    free(data);
    return 0;
}

I've tried this code, and it does output 13943 7 238 -123 for me, but I'm not entirely sure if it's standard-compliant.


Edit: Specifically, I'm trying to make a dynamic array type (which can be an array of any type), so I'm allocating an array of one type, and using the start of that allocation as a pointer to a header which contains the length and capacity of the array.

To be clear, here is approximately what I'm doing:

size_t header_elements = (sizeof(ArrayHeader) + array_type_size - 1) / array_type_size); /* = ceil(sizeof(ArrayHeader) / array_type_size) */
void* data = malloc((header_elements + array_length) * array_type_size);
ArrayHeader* header = data;
void* array = (char*)data + header_elements * array_type_size;

So, header points to the start of the allocation and the actual array is offset by a multiple of the size of the type stored in it.

pommicket
  • 929
  • 7
  • 17
  • I believe it is – Rishikesh Raje Jul 17 '19 at 05:51
  • @chux: why delete your answer? – chqrlie Jul 17 '19 at 06:27
  • The point in DR28 is that the compiler *can* optimize, but it is not explicitly mentioned in the C90 text so you cannot find any relevant quote in it. The point is that they *both* are pointing to the same location, the one that in C99 was worded as the effective type stuff. – Antti Haapala -- Слава Україні Jul 17 '19 at 09:11
  • Oops yes, I misread that. But anyways, that's saying the compiler can optimize it even if the two pointers overlap, which isn't true in my case. – pommicket Jul 17 '19 at 09:16
  • *using the start of that allocation as a pointer to a header which contains the length and capacity of the array* Which means the remainder of the array is no longer "suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated" unless you've taken care to ensure the size of your header matches the required alignment. With C11 and later you can use `_Alignof (max_align_t)` to determine that. – Andrew Henle Jul 17 '19 at 09:41
  • @AndrewHenle Are you sure? I'm not just adding the size of the header to the pointer and using it as the array (see the code I just added). – pommicket Jul 17 '19 at 11:00
  • @LeoTenenbaum [I'm sure](https://stackoverflow.com/questions/42630843/bus-error-with-allocated-memory-on-a-heap). Just because it works on x86 systems doesn't mean you're not violating an alignment restriction. The way you're doing it now looks marginally OK, though, as you're assuming `int` is twice the size of `short`. It doesn't have to be, but likely is on your system. (Your code was not posted when I made my original comment) – Andrew Henle Jul 17 '19 at 11:19
  • @AndrewHenle That example is different, because they're using `new` and giving it the type, and `new` probably aligns based on the type (also, even if they did use `malloc`, 250 might not be a multiple of `sizeof(int)`, so it wouldn't necessarily be aligned properly). `malloc`, on the other hand is guaranteed to be aligned for all types (i.e. I could remove either the `*int_ptr = ...` line or the `short_ptr[x] = y` lines and it would definitely comply to the standard); I'm just not sure if it'll work with two different types at the same time. – pommicket Jul 17 '19 at 12:23
  • @LeoTenenbaum *because they're using `new`* That's irrelevant to alignment restrictions. Do you understand [strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule)? Have you ever written code for non-x86 systems that will fail to run code that does misaligned accesses? `short_array += 2;` in your code in isolation is correct. This is not strictly correct: `/* Skip over the int */` What if a `short` is 2 bytes but an `int` is 8 bytes? Your code makes size assumptions that are not safe. – Andrew Henle Jul 17 '19 at 12:46
  • @AndrewHenle **suppose that I know that `sizeof(int) <= 2 * sizeof(short)`**. In the code below that (which is closer to the code that I'm actually using), I actually do make sure that the header and the array don't overlap, like replacing `short_array += 2` with `short_array += ceiling(sizeof(int) / sizeof(short))` (and of course allocating enough space to do so). – pommicket Jul 17 '19 at 12:53
  • @LeoTenenbaum I see no evidence at all that you've considered [overaligned types](https://stackoverflow.com/questions/8732441/what-is-overalignment-of-execution-regions-and-input-sections) – Andrew Henle Jul 17 '19 at 13:06
  • @AndrewHenle Suppose on some (strange) system, a `short` must be at a memory address which is a multiple of 3 and an `int` must be at a memory address which is a multiple of 5. (so `sizeof(int) <= 2 * sizeof(short)`). `malloc` must return a pointer which can both be used as a `short*` *and* as an `int*`, so it must be a multiple of 15. Let's say it picks memory address 90. What I'm doing is using address 90 (which is a multiple of 5) as an `int`, and using addresses 96, 99, and 102 (which are all multiples of 3) as `short`s. So, on that hypothetical system, this would work. – pommicket Jul 17 '19 at 13:48
  • @AndrewHenle: While DR#028 was correct in saying that the indicated optimization should be allowed, the stated rationalization is nonsensical. If one recognizes the rules as saying that a byte which is used as a type T in some context may only subsequently be accessed by an lvalue which is derived, within that context, from an object of one of the listed types, that will be sufficient to define the behavior of allocated storage without any need for "effective types" based on the nonsensical rationale of DR#028. – supercat Aug 08 '19 at 21:09

1 Answers1

1

C90 ought to be dead by now, rest in peace. The C99, C11 and C18 behaviour is the one that should be considered here. They talk a lot about effective types. As the object allocated by malloc is not typed as such, the compiler will track the types of each pointer and memory area.

If you write an int to the first 4 bytes, then the compiler is allowed to consider the datatype at the first 4 bytes to be an int. If then you write shorts to the successive bytes, their effective type will be short. If the storage does not overlap then your code is OK.

But - be careful: if you however overlap the storage, i.e. you'll write the short over the int and read back the int afterwards, then all bets are off.

Finally there is a rather good way of avoiding ambiguity - use a struct type. How about just using

struct two_types_in_one_malloc {
    int the_int;
    short the_shorts[3];
};
  • The particular thing I'm trying to do is make a dynamic array type (so I'm allocating space for an array, and using the first part as a header for the length and capacity), but I'd like it to work for all types, which is why I can't "just" use a struct. As for "The C99, C11 and C18 behaviour is the one that should be considered here", I was asking about C90, but oh well... – pommicket Jul 17 '19 at 08:10
  • Then please edit your question to say so! Good motivation – Antti Haapala -- Слава Україні Jul 17 '19 at 08:32
  • 1
    @LeoTenenbaum [þhis is the reason why the C90 references are not useful](https://kristerw.blogspot.com/2017/07/strict-aliasing-in-c90-vs-c99-and-how.html) – Antti Haapala -- Слава Україні Jul 17 '19 at 08:34