This question is really about how to use variable-length types in the Python/C API (PyObject_NewVar
, PyObject_VAR_HEAD
, PyTypeObject.tp_basicsize
and .tp_itemsize
, but I can ask this question without bothering with the details of the API. Just assume I need to use an array inside a struct
.
I can create a list data structure in one of two ways. (I'll just talk about char
lists for now, but it doesn't matter.) The first uses a pointer and requires two allocations. Ignoring #include
s and error handling:
struct listptr {
size_t elems;
char *data;
};
struct listptr *listptr_new(size_t elems) {
size_t basicsize = sizeof(struct listptr), itemsize = sizeof(char);
struct listptr *lp;
lp = malloc(basicsize);
lp->elems = elems;
lp->data = malloc(elems * itemsize);
return lp;
}
The second way to create a list uses array notation and one allocation. (I know this second implementation works because I've tested it pretty thoroughly.)
struct listarray {
size_t elems;
char data[1];
};
struct listarray *listarray_new(size_t elems) {
size_t basicsize = offsetof(struct listarray, data), itemsize = sizeof(char);
struct listarray *la;
la = malloc(basicsize + elems * itemsize);
la->elems = elems;
return lp;
}
In both cases, you then use lp->data[index]
to access the array.
My question is why does the second method work? Why do you declare char data[1]
instead of any of char data[]
, char data[0]
, char *data
, or char data
? In particular, my intuitive understanding of how struct
s work is that the correct way to declare data
is char data
with no pointer or array notation at all. Finally, are my calculations of basicsize
and itemsize
correct in both implementations? In particular, is this use of offsetof
guaranteed to be correct for all machines?
Update
Apparently this is called a struct hack: In C99, you can use a flexible array member:
struct listarray2 {
size_t elems;
char data[];
}
with the understanding that you'll malloc
enough space for data
at runtime. Before C99, the data[1]
declaration was common. So my question now is why declare char data[1]
or char data[]
instead of char *data
or char data
?