Dynamic arrays with an embedded meta-information struct

Question

I am fiddling around with an implementation of a generic dynamic array. The array should hold information about its size, how many entries are used, and then hold the actual data. The meta-information (size/used) is generic, but the data needs to handle different types, so I am handling that with macros. I am trying, however, to get the memory allocation code into functions. So my thought it is: I have a struct for meta-information

struct da_meta {
  size_t size;
  size_t used;
};

and then I have a macro that creates a struct per type, using a flexible array following the meta information:

#define dynarray(TYPE)                 \
struct {                               \
  struct da_meta meta;                 \
  TYPE   data[];                       \
}

I can declare an integer array, for example, as

  dynarray(int) *int_array = 0;

To allocate and reallocate arrays, my thought was now to use code such as this:

#define size_overflow(meta_size, obj_size, len) \
  ((SIZE_MAX - meta_size) / obj_size < len)

// Always free if we cannot reallocate
void *realloc_dynarray_mem(void *p,
                           size_t meta_size,
                           size_t obj_size,
                           size_t new_len)
{
  if (size_overflow(meta_size, obj_size, new_len))
    goto abort;

  struct da_meta *new_da =
    realloc(p, meta_size + obj_size * new_len);
  if (!new_da) goto abort;

  new_da->size = new_len;
  new_da->used = MIN(new_da->used, new_len);

  return new_da;

abort:
  free(p);
  return 0;
}

The function gets the size of the struct sans the data objects, the size of individual objects, and the number of objects to allocate memory for. I don't use the size of the struct meta type, because it might be too small depending on the alignment of the data objects, but I will get it from sizeof the concrete (typed) structures. The function will always free the input and return NULL if I cannot allocate because in my application I have to give up if I cannot grow the array, so I don't try to preserve the old data in case there is an error.

There is nothing wrong with this code, as far as I can tell. I can always allocate memory, and as long as I have more than the size of struct meta, I can set the variables there. But when I return the result and use it as a dynarray(T) type, I am less sure. I think it should work, because C should put the memory of the first member of a struct first in a struct, and that is where I put struct meta, but am I right here?

I create a new array like this:

void *new_dynarray_mem(size_t meta_size,
                       size_t obj_size,
                       size_t len)
{
  struct da_meta *array =
    realloc_dynarray_mem(0, meta_size, obj_size, len);
  if (array) {
    // we do set size in realloc, but
    array->size = len;
    // if used was not initialised in realloc (and it wasn't)
    // then we have to set it here...
    array->used = 0;
  }
  return array;
}

#define new_da(type, init_size)                  \
  new_dynarray_mem(sizeof(dynarray(type)),       \
                   sizeof(type), init_size)

Here, the macro new_da() gets the size of the header/meta information from sizeof(dynarray(type)) and the size of the underlying types from sizeof(type). The second value is fine, but I am also uncertain about the first. Does the C standard guarantee that if I create two different structs with exactly the same code, e.g., calling dynarray(int) twice, that I get the same memory layout? I cannot imagine a compiler that would give me a different layout for the same code, but when it comes to imagining what compilers get up to, I am quite limited.

For appending to the array, I think all is fine. There I do not generate new types but get the size from the existing dynamic array, so if the first allocation is standard compliant, then I think the appending is as well, but I could be wrong.

#define da_free(da)                              \
  do { free(da); da = 0; } while(0)

#define grow(size)                               \
  (((size) == 0) ? /* special case for zero */   \
    1 :                                          \
    ((size) > SIZE_MAX / 2) ? /* can we grow? */ \
      0 : /* no, then report size zero */        \
      (2 * (size))) /* double the size */

#define da_append(da, ...)                      \
do {                                            \
  if (da->meta.used == da->meta.size) {         \
    size_t new_size = grow(da->meta.size);      \
    if (new_size == 0) { da_free(da); break; }  \
    da = realloc_dynarray_mem(                  \
      da, sizeof *da, *da->data, new_size       \
    );                                          \
    if (!da) break;                             \
  }                                             \
  da->data[da->meta.used++] = __VA_ARGS__;      \
} while (0)

Am I guaranteed that if I lay out the concrete dynamic arrays with the meta-information at the top of the structs, then I can treat the allocate memory as both a pointer to the meta-information and the array? Is it safe to assume that I get the same size and memory layout if I generate the same struct twice? I feel that it must be that way since it shouldn't differ from if I include the same header file twice, but since I am generating the code there might be something that I am missing.

EDIT Based on the comments, I have updated the code to that below, but I have left the original code (of course) so the comments make sense in terms of that.

#define da_at(da,i)  (da->data[(i)])
#define da_len(da)   (da->meta.used)

struct da_meta {
  size_t size;
  size_t used;
};

#define dynarr(TYPE)                   \
struct {                               \
  struct da_meta meta;                 \
  TYPE   data[];                       \
}

// Always free if we cannot reallocate
void *realloc_dynarray_mem(struct da_meta *p,
                           size_t meta_size,
                           size_t obj_size,
                           size_t new_len)
{
  // Size size overflow?
  if (((SIZE_MAX - meta_size) / obj_size < new_len))
    goto fail;

  struct da_meta *new_da =
    realloc(p, meta_size + obj_size * new_len);
  if (!new_da) goto fail;

  new_da->size = new_len;
  new_da->used = MIN(new_da->used, new_len);

  return new_da;

fail:
  free(p);
  return 0;
}

void *new_dynarray_mem(size_t meta_size,
                       size_t obj_size,
                       size_t len)
{
  struct da_meta *array =
    realloc_dynarray_mem(0, meta_size, obj_size, len);
  if (array) array->used = 0;
  return array;
}

void *grow_dynarray_mem(struct da_meta *p,
                        size_t meta_size,
                        size_t obj_size)
{
  // Can we double the length?
  size_t used = meta_size - obj_size * p->size;
  size_t adding = MAX(1, p->size);
  if ((SIZE_MAX - used) / obj_size < adding) {
    free(p);
    return 0;
  }

  return realloc_dynarray_mem(
    p, meta_size, obj_size, p->size + adding
  );
}

#define new_da(da, init_size)                    \
  new_dynarray_mem(sizeof *(da),                 \
                   sizeof *(da)->data,           \
                   (init_size))

#define da_free(da)                              \
  do { free(da); da = 0; } while(0)

#define da_append(da, ...)                       \
do {                                             \
  if (da->meta.used == da->meta.size) {          \
    da = grow_dynarray_mem(                      \
      (struct da_meta *)da,                      \
      sizeof *da, sizeof *da->data               \
    );                                           \
    if (!da) break;                              \
  }                                              \
  da->data[da->meta.used++] = __VA_ARGS__;       \
} while (0)

When used, the code can look like this:

int main(void)
{
  dynarr(int) *int_array = new_da(int_array, 0);
  if (!int_array) goto error;
  printf("%zu out of %zu\n",
         int_array->meta.used,
         int_array->meta.size);

  for (int i = 0; i < 5; i++) {
    da_append(int_array, i);
    if (!int_array) goto error;
  }

  for (int i = 0; i < da_len(int_array); i++) {
    printf("%d ", da_at(int_array, i));
  }
  printf("\n");

  da_free(int_array);

  return 0;

error:
  return 1;
}

As advised in this http://www.rkoucha.fr/tech_corner/c_preprocessor.html (c preprocessor best practices), use parentheses around parameters in da_append(), size_overflow() — Rachid K., Oct 21 '20 at 08:00
In the "abort" section of the code of realloc_dynarray_mem(), "p" may be NULL. Even if GLIBC accepts NULL as parameter for free(), I would check p before calling free(). — Rachid K., Oct 21 '20 at 08:08
True about the parentheses, although I think that append will break if da isn’t a variable anyway. For free(), the standard has allowed NULL at least since C99. — Thomas Mailund, Oct 21 '20 at 08:11
`Even if GLIBC accepts NULL as parameter for free()` Any free accepts NULL. — KamilCuk, Oct 21 '20 at 08:11
Defining a variable in a macro is generally a bad idea because the variable name may conflict with other variables defined (either locally or globally). So, instead of using "new_size" variable in da_append(), I would do the check/calculation of the new_size in the realloc_dynarray_mem() function. — Rachid K., Oct 21 '20 at 10:46
Not sure why you need a variable number of arguments in the da_append() macro. — Rachid K., Oct 21 '20 at 10:47
The `new_size` variable gets its own scope as it is in its own block. So there shouldn't be a problem there. But it is gone again, now, when I have moved the growing code to a function. The variadic macro is necessary to handle things like struct initialisers. E.g. with `struct point { double x,y };` and an array `dynarray(struct point)` you should be able to `da_append(array, (struct point){ .x = 0, .y = 0 });`, but the comma in the struct data is seen as an argument separator by the preprocessor. With ..., it takes all the code after the array and puts it into the assignment. — Thomas Mailund, Oct 21 '20 at 12:26
@ThomasMailund, GCC compiler will raise a warning such as "declaration of 'new_size' shadows a previous local or previous global" when used with the -Wshadow flag. The latter flag is widely used in big software (in my company it is a requirement from the quality team) because it avoids using a variable instead of another. Hence, we forbid the use of variables in macros or use some special names like `__new_size__`... — Rachid K., Oct 21 '20 at 14:59
It is a warning in fact with -Wshadow but the company enforces -Werror along with it : so the warning becomes an error. — Rachid K., Oct 21 '20 at 15:09
Yes, I can see how you can object to it as a style issue. It is, of course, better to avoid it as in the new code. — Thomas Mailund, Oct 21 '20 at 15:17

KamilCuk · Answer 1 · 2020-10-21T08:27:46.777

Just remember about padding between between meta and the start of the array and about alignment requirements and you should be fine.

because C should put the memory of the first member of a struct first in a struct, and that is where I put struct meta, but am I right here?

Yes.

Am I guaranteed that if I lay out the concrete dynamic arrays with the meta-information at the top of the structs, then I can treat the allocate memory as both a pointer to the meta-information

Yes, and...

and the array?

No. The array starts at address after meta + padding. So at address (char*)da + sizeof(dynarray(TYPE)) or just da->data.

Is it safe to assume that I get the same size and memory layout if I generate the same struct twice?

No and yes. There are many other great stackoverflow questions and answers about that topic - research them. Pragmatically yes, it would be a strange compiler that would would generate different padding for the same looking struct, but technically that's allowed.

using a flexible array

Unless you have specific aim, then I would just advise not to use them. It makes it harder for you to write the code. It makes it very hard to create and manage an array of such arrays.

goto abort;

What an unfortunate name for a goto label - abort() is a standard function.

#define grow(size)

Please use a prefix to all your library functions, especially macros. Defining such macro will make it impossible to use it in other code that happens to use a different grow() function. da_ seems like a good prefix.

I guess *da->data in realloc_dynarray_mem should be sizeof(*da->data).

@edit

I agree with most, but why isn’t da freed in append? That is why i make the reallocate function do. I explicitly free the argument if realloc() returns NULL. — Thomas Mailund, Oct 21 '20 at 08:16
Och, there it is! My fault. So it just set's `da` to 0 on allocation error? Ok, I get it. So the user has to `if (array == 0) { handle_error() }` ok. — KamilCuk, Oct 21 '20 at 08:25
Yes, the idea was to report errors that way. I have another solution that keeps the memory around so a user can still get it, but error handling is more complicated and I don’t need it, so I changed it to this. — Thomas Mailund, Oct 21 '20 at 08:35

Rachid K. · Answer 2 · 2020-10-21T12:36:22.283

0

I would suggest to use typeof keyword in new_da(). This would avoid specifying the type twice: in dynarray(TYPE) and in new_da(type, init_size). To make it, instead of passing the type, just pass the pointer on the dynamic array:

#define new_da(da, init_size)            \
  (da) = new_dynarray_mem(sizeof(dynarray(typeof(*(da)))),  \
              sizeof(typeof((da)->data[0])), (init_size))

Hence, this would avoid the mistake where the type used in the definition would differ from the type used in the allocation:

dynarray(int) *pInt;
pInt = new_da(char, 1024);

UPDATE FROM DISCUSSION IN COMMENTS:

And what about a single macro to define and initialize ?

#define new_da(da, type, init_size) \
        dynarray(type) *da = new_dynarray_mem(sizeof(dynarray(type)), sizeof(type), init_size)

edited Oct 21 '20 at 12:36

answered Oct 21 '20 at 10:43

Rachid K.

4,490
3
11
30

I wanted to combine definition and initialisation, so `dynarray(int) *da = new_da(int,10);`, which is why I don't add `da` to the macro. In any case, if I did, I wouldn't need `typeof()` (which is a compiler extension and not in the standard). I could just write `new_dynarray_mem(sizeof *(da), sizeof *(da)->data, (init_size))` as I do in the append code. Once I have something of a type, I don't need to get the type unless I want to define another variable of that type. I can always get the `sizeof`. – Thomas Mailund Oct 21 '20 at 11:30
I understand, but as a general purpose library, the user could use it as in my example. So, for robustness purposes, you should not provide the opportunity to make the mistake. – Rachid K. Oct 21 '20 at 11:33
Moreover, if you restrict the allocation of your array with "dynarray(int) *da = new_da(int,10);", how do you define "da" as a global variable if it is needed ? There are some tricks to make it but not simple at first sight... The compiler will not accept this construct in a global scope. – Rachid K. Oct 21 '20 at 11:38
If he writes code like that, he starts with an uninitialised pointer, which is as much of a problem. I am not happy that you need to provide the type twice to define an array, but I dislike the other code more. Especially because it makes it easy to write code such as `new_da(da, 10) ; new_da(da, 10)` which leaks memory. The other operations leave the array in a consistent state, and free it if they cannot handle memory allocation. If `new_da()` assigns to the pointer, it should either free it first (which means that it *must* be initialised as NULL when defined), or it easily leads to leaks. – Thomas Mailund Oct 21 '20 at 11:40
Sorry, but I can see another weakness : the compiler will accept without raising any error : dynarray(int) *pInt = new_da(char *, 1024); – Rachid K. Oct 21 '20 at 11:40
Yes, if the types do not match at the initialisation point, then you are in trouble. I am aware. It is not ideal. I find it better than the alternative, though. It is true that I cannot initialise it as a global variable with that expression, so there I need two statements. It is not the common usage of a dynamic array, I suspect, though. It is, again, only a single place where I need the types to match. – Thomas Mailund Oct 21 '20 at 11:44
Of course, if I could get the type from what I assign to, that would be wonderful, but that isn't possible. I had another implementation where I used initialisation code like what you propose (but where the array was stack allocated and only the data was on the heap), and I played around with it a lot. I just never liked the two-step initialisation. You get the type check in the initialisation, yes, but you introduce a use pattern that I don't like; not initialising as soon as you declare the array. I guess it is a matter of taste. – Thomas Mailund Oct 21 '20 at 11:46
Ok, I am surprised, but this works for me. Define the allocation as `#define new_da(da, init_size) new_dynarray_mem(sizeof (da), sizeof *(da)->data, (init_size))` (so it takes a pointer we are assigning to and get the sizes from there), and then I can declare an array like `dynarr(int) *int_array = new_da(int_array, 0);`. Whether this is a compiler extension or true to the standard I do not know, but it gives us the best of both worlds. – Thomas Mailund Oct 21 '20 at 12:12
No, it should work. It is just expanding to the usual `T *p = malloc(sizeof *p);` kind of code, and if that works, then this should also work. Excellent :) – Thomas Mailund Oct 21 '20 at 12:14
I would rather see : `#define new_da(da, init_size) new_dynarray_mem(sizeof (*da), sizeof *(da)->data, (init_size))` otherwise you get the size of the "da" pointer not the size of the data structure it points to. – Rachid K. Oct 21 '20 at 12:21
And what about a single macro to define and initialize : `#define new_da(da, type, init_size) dynarray(type) *da = new_dynarray_mem(sizeof(dynarray(type)), sizeof(type), init_size)` – Rachid K. Oct 21 '20 at 12:28
yes, I thought about doing it with a combined variable declaration and initialisation as well. It is not a bad solution, I just don't like that it doesn't "look" like C for declaring and assigning. Somehow `T x = y` or `T x; x = y` is what I want it to look like, not `decl(T,x,y)`. The last looks weird to me in C code. It is only aesthetics, though, the solution works fine. I guess I just prefer if I can get something that looks like other C types. With taking the variable as part of the initialiser, and at the same time returning the value, it looks sufficiently C-like to me ;) – Thomas Mailund Oct 21 '20 at 12:33
I mean, T x = f(x) does look a little weird, but we accept it in `T *x = malloc(sizeof *x);` so it is not entirely new, and I think I can live with that. – Thomas Mailund Oct 21 '20 at 12:33

Dynamic arrays with an embedded meta-information struct

2 Answers2