2

In C, one pattern for implementing generic containers, particularly vectors/dynamic arrays, is to allocate a header containing metadata before the memory that contains the user's actual data. The user's handle to the container is then an appropriately typed pointer to the start of his or her actual data. This pattern is used by Simple Dynamic Strings and stb_ds, for example.

I want to use the same pattern, but I'm concerned about accidentally relying on undefined behavior. In particular, I'm worried about the parts of the Standard that talk about effective types and that prohibit accessing an array out of bounds.

Hence, I was hoping that someone could check two snippets of code that achieve what I want to do in different ways. (Please excuse the void* casts - they may be superfluous in these examples but will occur in my actual implementation.)

My header struct for a vector is:

typedef struct
{
    alignas( max_align_t )
    size_t size;
    size_t capacity;
} Header;

In the first snippet, I navigate between the header and the user's handle via arithmetic on a pointer of type Header*:

//Step 1: Allocate memory for header and 100 ints and initialize the header
Header *original_hdr_ptr = malloc( sizeof( Header ) + sizeof( int ) * 100 );
original_hdr_ptr->size = 0;
original_hdr_ptr->capacity = 100;

//Step 2: Get the user's handle - does this invoke any undefined behavior?
int *users_handle = (void*)( original_hdr_ptr + 1 );
users_handle[ 0 ] = 12345; //Example use by user

//Step 3: Recover pointer to header
Header *new_hdr_ptr = (Header*)(void*)users_handle - 1;
//Is new_hdr_ptr now a valid, usable pointer to the header?
//Or have I violated the no-array-access-out-of-bounds requirement?

In the second snippet, I navigate between the header and the user's handle only via arithmetic on char* pointers:

//Step 1: Allocate memory for header and 100 ints and initialize the header
Header *original_hdr_ptr = malloc( sizeof( Header ) + sizeof( int ) * 100 );
original_hdr_ptr->size = 0;
original_hdr_ptr->capacity = 100;

//Step 2: Get the user's handle, this time via arithmetic on a char pointer
int *users_handle = (void*)( (char*)original_hdr_ptr + sizeof( Header ) ); //Does this invoke any undefined behavior?
users_handle[ 0 ] = 12345; //Example use by user

//Step 3: Recover pointer to header, again via arithmetic on a char pointer 
Header *new_hdr_ptr = (Header*)( (char*)(void*)users_handle - sizeof( Header ) );
//Is new_hdr_ptr now a valid, usable pointer to the header?

Does either snippet invoke undefined behavior in C or C++?

Thanks!

Jackson Allan
  • 727
  • 3
  • 11
  • 1
    Result of malloc is guaranteed to be `alignas( max_align_t )` aligned. There is no need to put it to the header – tstanisl Dec 02 '21 at 00:20
  • 4
    Consider using aligned flexible member. `typedef struct {size_t size; size_t capacity; alignas( max_align_t ) char data[]; } Header;`. This will solve issues with alignment of user pointer – tstanisl Dec 02 '21 at 00:25
  • @tstanisl ```alignas( max_align_t )``` here guarantees that the struct's size will be a multiple of ```sizeof( max_align_t )``` and therefore that data after the header will be appropriately aligned. The problem with flexible array members is that they are non-standard in C++, and I want to maintain C++ compatibility. – Jackson Allan Dec 02 '21 at 00:28
  • 2
    FWIW, the musl implementation of malloc uses approach #2, though with hardcoded offsets rather than sizeof(). https://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c?id=e5d78fe8df9bd61940abcd98ad07ed69b7da4350 That doesn't mean it's not UB, of course, but you would at least be in good company. :) – Nick ODell Dec 02 '21 at 00:30
  • 1
    Ok.. so `x+1` is preferred over `x.data` due to C++ compatibility. It may be one of those rare cases when C/C++ tag is appropriate. Anyway, as long as C is concerned I would be not so strict about the standard. Recently, I've asked the question (https://stackoverflow.com/q/70185038/4989451) and it looks that even trivial operation of dynamic memory is likely be UB. To my observations, as long as one uses pointer arithmetic via `char*`, and does not violate alignment and strict aliasing rule when accessing an object via a pointer then the code will work fine on all compilers. – tstanisl Dec 02 '21 at 07:41
  • 1
    @tstanisl *as long as C is concerned I would be not so strict about the standard* [I wouldn't put it that way](https://www.reddit.com/r/cpp/comments/g6ipqa/gccs_strict_aliasing_is_getting_crazy/). The original ANSI C standard wasn't about creating a new language - it was to standardize an already-existing language. Just because the current C standard might have a few holes in it because of its history doesn't mean it's "safe" to clearly violate the C standard in cases where the violation is clear. – Andrew Henle Dec 02 '21 at 08:28
  • 1
    @AndrewHenle, I meant that following the letter of the standard often leads to absurd like my question. Rather one should follow the spirit of the standard. And one of the principles is to prevent compilers from breaking the existing code. So if some language construction is in common use then the standard should not make it incorrect – tstanisl Dec 02 '21 at 10:17
  • @tstanisl Your linked question is indeed closely related to mine. The no-array-access-out-of-bounds-rule seems to technically treat a lot of malloc-related patterns that are common and idiomatic as UB. For my query, the question of pointer provenance also comes into play when converting the header pointer to the user's handle and back again. I guess the version that uses ```char*``` arithmetic is "safer", although the "issue" you asked about in your query will still apply to the user's access of his or her data in the vector/dynamic array. – Jackson Allan Dec 03 '21 at 01:58

0 Answers0