18

When allocating memory for a variable sized array, I often do something like this:

struct array {
    long length;
    int *mem;
};

struct array *alloc_array( long length)
{
    struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
    arr->length = length;
    arr->mem = (int *)(arr + 1); /* dubious pointer manipulation */
    return arr;
}

I then use the arrray like this:

int main()
{
    struct array *arr = alloc_array( 10);
    for( int i = 0; i < 10; i++)
        arr->mem[i] = i;
    /* do something more meaningful */
    free( arr);
    return 0;
}

This works and compiles without warnings. Recently however, I read about strict aliasing. To my understanding, the code above is legal with regard to strict aliasing, because the memory being accessed through the int * is not the memory being accessed through the struct array *. Does the code in fact break strict aliasing rules? If so, how can it be modified not to break them?

I am aware that I could allocate the struct and array separately, but then I would need to free them separately too, presumably in some sort of free_array function. That would mean that I have to know the type of the memory I am freeing when I free it, which would complicate code. It would also likely be slower. That is not what I am looking for.

ego
  • 279
  • 1
  • 6

3 Answers3

17

The proper way to declare a flexible array member in a struct is as follows:

struct array {
    long length;
    int mem[];
};

Then you can allocate the space as before without having to assign anything to mem:

struct array *alloc_array( long length)
{
    struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
    arr->length = length;
    return arr;
}
dbush
  • 205,898
  • 23
  • 218
  • 273
  • 2
    This works only if the struct contains one variable length element. If there are more, then as far as I know pointer manipulation like I outlined is necessary. Is it legal with regard to strict aliasing rules? – ego Feb 09 '18 at 21:32
  • 3
    @ego Your question didn't specify that. In that case, you won't have any strict aliasing violations because the memory in question was malloc'ed and does not yet have a type, however you may run into alignment issues. – dbush Feb 09 '18 at 21:36
  • @ego Your best bet in that case would probably be to simply do separate allocations for the struct and the arrays it contains. You need to know something about what you free when you free it anyway, and you won't see any noticeable change in speed. – dbush Feb 09 '18 at 21:38
  • @dbush are you saying that doubling the number of `malloc` and `free` calls in the program will not cause a noticeable change in speed? – SergeyA Feb 09 '18 at 21:39
  • _flexible array member_ is the way to go as long as only one such member is needed at the end (and at least one other member exist). – chux - Reinstate Monica Feb 09 '18 at 21:45
  • Now if someone could just convince the developers of glibc that that is now the proper way -- we could all get on the same sheet of paper and dispense with the struct-hack forever, e.g. [glibc - struct dirent](https://sourceware.org/git/?p=glibc.git;a=blob;f=bits/dirent.h;h=8c38b8c7586d0deac6e7782e0803df85445331ed;hb=refs/heads/release/2.27/master) – David C. Rankin Feb 09 '18 at 23:53
  • Even if that works, why would you do something complicated and non-obvious, which as far as I can see offers no advantage over the normal, everyday method of simply assigning allocated memory to a pointer? – jamesqf Feb 10 '18 at 03:50
7

Modern C officially supports flexible array members. So you can define your structure as follows:

struct array {
    long length;
    int mem[];
};

And allocate it as you do now, without the added hassle of dubious pointer manipulation. It will work out of the box, all the access will be properly aligned and you won't have to worry about dark corners of the language. Though, naturally, it's only viable if you have a single such member you need to allocate.

As for what you have now, since allocated storage doesn't have a declared type (it's a blank slate), you aren't breaking strict aliasing, since you haven't given that memory an effective type. The only issue is with possible mess-up of alignment. Though that's unlikely with the types in your structure.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
1

I believe the code as written does violate strict aliasing rules, when standard read in the strictest sense.

You are accessing an object of type int through a pointer to unrelated type array. I believe, that an easy way out would be to use starting address of the struct, and than convert it char*, and perform a pointer arithmetic on it. Example:

void* alloc = malloc(...);
array = alloc;
int* p_int = (char*)alloc + sizeof(array);
SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Is `(char*)alloc + sizeof(array);` certainly aligned for `int`? `alloc` is OK, yet `sizeof(array)` is not specified to be a multiple of `int`. I'd expect that to fail though only an a hostile or unicorn platform. – chux - Reinstate Monica Feb 09 '18 at 21:43
  • @chux I have all but forgotten about alignment. Haven't dealt with platforms which care about it for a while. I was focusing on aliasing question, but of course, alignment could be very important. – SergeyA Feb 09 '18 at 21:48
  • Same for me about alignment until a picky platform and a dozen bus-faults later. – chux - Reinstate Monica Feb 09 '18 at 21:50
  • @chux which one it was if you do not mind me asking? Last align-conscious platform I dealt with was Sparc around 8 years ago or so. – SergeyA Feb 09 '18 at 21:53
  • Various [PICs](https://en.wikipedia.org/wiki/PIC_microcontroller) these days that do not like 2-byte `int` on odd boundary. Since then I have become more alignment-aware. – chux - Reinstate Monica Feb 09 '18 at 21:54
  • IAC, even on platforms that tolerate unusual alignments (do not bus-fault), native aliment for the type can result in better performance. – chux - Reinstate Monica Feb 09 '18 at 21:59
  • @chux can? Sure. However, the platform I am using now exclusively, x86_64, does not care about alignment, and I have grown lax. – SergeyA Feb 09 '18 at 22:04
  • 1
    @SergeyA Yes it does, unaligned accesses are slower. However on ARM (the now-most-popular general purpose computing platform?) unaligned accesses do raise an exception. – user253751 Feb 10 '18 at 02:54
  • 1
    Sergey and @immibis: breaking C alignment rules on x86-64 *can* result in correctness problems, not just performance, when gcc auto-vectorizes: https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64. And BTW, unaligned access is only slower on modern x86-64 if it crosses a cache-line boundary (or maybe a 32-byte boundary on AMD CPUs). Also, if you misalign an `atomic_int` across a cache-line boundary, it won't be atomic anymore (for load or store, and for atomic RMW it will be *very* slow.) – Peter Cordes Feb 10 '18 at 03:05