2

I was reading about zero-length arrays, where and how it is used and all. And I understood that they are used when one want a dynamically sized member in one's structure. It is different from using a pointer because

  • It lets you allocate the memory for the struct and for variable-length the array at the end of the struct as one continuous memory block.
  • If you used a pointer, you'd either have to allocate memory separately (two malloc calls, likely non-continuous) or use some other tricks (to achieve proper alignment etc.).

as taken from this answer.


Now, all this is good and ok. Until I read some more older facts about this issue.

Fact 1

  • In ISO C90, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc.

Fact 2

  • GCC allows this feature as an extension.

Now I really don't know what an extension means here. But what I am thinking about is about Fact 1. More explanation ahead.


Lets look at some code.
#include<stdio.h>
#include<stdlib.h>

struct line
{
    int len;
    char* content;
};

int main()
{

    int i;
    struct line* p = malloc(sizeof(struct line) + 10);
    p->len = 10;
    p->content = (char*)(&(p->len) + 1);

    printf("%p\n", &(p->len));

    for(i = 0; i < 10; i++)
    {
        printf("%p\n", &(p->content[i]));
    }

}

This piece of code that I came up with effectively implements all the features of the zero-length array. May be there is a line of code extra, but it is definitely better then the bad effects Fact 1 has.

So, my question is, was there any particular reason why the above way (demonstrated in the code above) was not used and they had to add an extension to GCC, or use an array of size 1.

Community
  • 1
  • 1
Haris
  • 12,120
  • 6
  • 43
  • 70
  • Zero-length array or array of 1 element ensures correct alignment constraint. – mikedu95 Jan 14 '16 at 16:55
  • `p->content = (char*)(&(p->len) + 1);` should just be `p->content = (char *) (p + 1);` to step past the structure. – unwind Jan 14 '16 at 16:56
  • @mikedu95, as far as I can understand, the above code will also ensure alignment. – Haris Jan 14 '16 at 16:56
  • @unwind, It just have to go past the `int len`. – Haris Jan 14 '16 at 17:00
  • 1
    @Haris no, you do not want a pointer which points to itself. It should point to the first (aligned) byte after the struct. – mch Jan 14 '16 at 17:01
  • @mch. It should not point to the first aligned byte. It should point to the byte just next to the 2nd last element of the struct. Thats what I understood from zero-length arrays. – Haris Jan 14 '16 at 17:03
  • With this code, `p->content[0]` (a place to store a `char`) overlaps `p->content` ( a pointer) negating the ability to store arbitrary data in `p->content[0]`. IMO, this is a workable implementation of zero-length arrays, but not non-zero-length arrays. – chux - Reinstate Monica Jan 14 '16 at 17:09
  • @chux, the same would happen with the actual zero-length array representation. See here --> http://ideone.com/JWcxUn – Haris Jan 14 '16 at 17:15
  • The difference is that your `content` is the **value** of the field `content`. Your referenced example `content` is the **address** of the 1st element of the array `content[]`. Your solution requires an explicit saving of the address - and thus not available for other uses, the ref example deuces the address from `p`. With this approach, code cannot save a `char` in `p->content[0]` without messing up `content`. – chux - Reinstate Monica Jan 14 '16 at 17:18
  • @chux, in the referenced example the **value** of the field `content` itself is the **address** of the 1st element. And that is the same case with the actual zero-length array implementation. the only difference would be the in the actual implementation, `content` is the array and in this `content` is a pointer. The differences between them would apply here. And about getting the address from `p`, thats just an extra line of code, it is certainly better then how it was done in ISO C90 time. – Haris Jan 14 '16 at 17:27
  • With this approach, code cannot use the first few `p->content[i]` to store array data. – chux - Reinstate Monica Jan 14 '16 at 17:31
  • 1
    No. To the compiler, `p->content` is a pointer, which means that it has to dereference it in order to get at the value. If you use that space for the first characters of your line, you will invalidate the pointer. Had the field been an array, `p->content` would just be a pointer to the first elements of the array inside the structure. Arrays are not pointers in the scope where they are declared. They only decay into pointers when they are passed to functions. – M Oehm Jan 14 '16 at 17:32
  • @MOehm, Yes. Thats what chux's answer also points out. But I make a little changes based on unwind's comment. see my comment in chux's answer. – Haris Jan 14 '16 at 17:39
  • Problems: 1) You are using more space than an array of length 1 (if the object type has a size less than a pointer). 2) Your data may not be correctly aligned (it works for char but other types not so much). 3) Your struct is no longer trivial copyable. – Martin York Jan 14 '16 at 18:30

2 Answers2

2

This approach does not work when len > 0.

It saves in the content field 2 things, the address of the array and the first elements of the array.

A re-write of code to demonstrate the effect.

#include<stdio.h>
#include<stdlib.h>

struct line
{
    int len;
    char* content;
};

int main()
{

    int i;
    size_t size = sizeof(struct line) + 10;
    struct line* p = malloc(sizeof(struct line) + 10);
    memset(p, 0, size);
    p->len = 10;
    p->content = (char*)(&(p->len) + 1);

    printf("%p\n", &(p->len));

    for(i = 0; i < 10; i++)
    {
        printf("%p %02hhX\n", (void*) &(p->content[i]), p->content[i]);
    }

}

Output

0x80028260
0x80028264 64  // content address takes up same space as char array
0x80028265 82
0x80028266 02
0x80028267 80
0x80028268 00
0x80028269 00
0x8002826a 00
0x8002826b 00
0x8002826c 00
0x8002826d 00
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Oh yes. I got it. But then what unwind suggested would work perfectly, because that would make `content` point to the memory just next to where the pointer `content` itself is stored in the struct. see this --> http://ideone.com/ZHQw1Q – Haris Jan 14 '16 at 17:38
  • @Haris, that revision idea is true, but that revised code takes up more space (sizeof pointer) than truly needed. So Clarity vs. Efficiency. The space need is really smaller: `size_t size = sizeof(p->len) + 10*sizeof(*(p->content));` (Ignoring aliment issues for now) – chux - Reinstate Monica Jan 14 '16 at 17:45
  • Yes, that is true. It does take more space. extra space for the pointer to be stored. – Haris Jan 14 '16 at 17:49
  • 1
    @Haris: Your example works, but you are probably aware that insead of potentially wasting 1 char you now waste the space of an additional pointer and incur the cost of an additional dereference. I wouldn't worry too much about that. You are not likely to manage the memory in client code. Use the portable standard approach of a single-item array and write a nice constructor that does the memory calculations for you. – M Oehm Jan 14 '16 at 17:50
  • @MOehm, Yes. I get it now. :) – Haris Jan 14 '16 at 17:52
  • @Haris Agree with M Oehm, something like `struct line2 { int len; char content[1]; } *p; size_t sz = sizeof *p - sizeof *(p->content) + N*sizeof *(p->content)`. – chux - Reinstate Monica Jan 14 '16 at 17:53
  • @Haris: I didn't want to hammer the point home, I'm just slow with the typing tonight, so I hadn't seen chux's earlier reply. – M Oehm Jan 14 '16 at 17:56
  • @MOehm, I undertsand. I just acknowledged your comment, nothing else. :) – Haris Jan 14 '16 at 17:58
2

The reason you'd declare char content[1] instead of your approach is to keep your structure directly serializable and deserializable. Think about what would happen if you wrote your line structure over a network socket or to a file for another application to read. The line->content pointer would be completely invalid for the receiving application.

If you declare the content as an array of size one, you don't have the above problem. However you do have to slightly complicate your malloc call.

mshildt
  • 8,782
  • 3
  • 34
  • 41
  • Why would `line->content` pointer be completely invalid for the receiving application? It is layed out properly in the memory in one continuous block. – Haris Jan 14 '16 at 17:48
  • 1
    Because when the receiving application `mallocs` space to hold the structure the chances of it being stored at the same memory address as the source application had it are extremely low. Remember, the `content` field takes up 4 (or 8 for 64-bit) bytes of space to store the memory address directly following the structure. This memory address will differ (most likely) between source and receiving application. – mshildt Jan 14 '16 at 17:51
  • Ok, I get it now what you are trying to say. That is also a valid point. But the sending application has to send the `len` and then read from the `content` pointer and send the `char`, rather then sending the whole structure. – Haris Jan 14 '16 at 17:57
  • @epicbrew: Even with flexible array members, you still have to figure out how much memory to `malloc`. Applying `sizeof` to such a structure omits the size of the array member. – Keith Thompson Jan 14 '16 at 19:44
  • @KeithThompson Yes this is true, my previous comment didn't mean to imply you could neglect to calculate the size of the trailing array for the `malloc` call. Rereading it now I can see how my above comment could be confusing. I should probably delete it. – mshildt Jan 14 '16 at 19:46
  • It's worth noting that C99 provides [Flexible Array Members](https://en.wikipedia.org/wiki/Flexible_array_member) as yet another solution to this problem. – mshildt Jan 14 '16 at 19:50