25

In C, I am trying to do the following:

typedef struct {
    int length;
    int items[];     /* 1 */
} wchararray_t;

typedef struct {
    long hash;
    wchararray_t chars;   /* 2 */
} string_t;

static string_t s1 = {
    617862378,
    { 5, { 'H', 'e', 'l', 'l', 'o' } }  /* 3 */
};

In full words, I would like a type string_t that ends in another type wchararray_t that is itself dynamically sized -- its size being stored in length. Moreover, I would also like to write a prebuilt particular string, as static data, here s1 of length 5.

The code above assumes C99 support for /* 1 */. The inclusion of the substructure into the bigger structure at /* 2 */ is, as far as I understand, not supported even by the C99 standard -- but GCC accepts it. However, at /* 3 */ GCC gives up:

error: initialization of flexible array member in a nested context

As a workaround, the ideal code above is so far written as the following hack, which "kind of works":

typedef struct { int length; int items[1]; } wchararray_t;
typedef struct { long hash; wchararray_t chars; } string_t;

typedef struct { int length; int items[5]; } wchararray_len5_t;
typedef struct { long hash; wchararray_len5_t chars; } string_len5_t;

static union { string_len5_t a; string_t b; } s1 = {
    617862378,
    { 5, { 'H', 'e', 'l', 'l', 'o' } }
};

...and we'd use "s1.b" as the prebuilt string_t (and never refer to "s1.a", which is here only for the static declaration of s1). However, it breaks in the newest GCC 4.8, which optimizes away parts of our code because -- obviously -- any loop over the items of a wchararray_t can iterate only once, given that it is an array of length 1.

This particular issue is fixed by giving gcc the option -fno-aggressive-loop-optimizations. It can probably also be fixed by not declaring the length in wchararray_t's items[] array, making it a dynamic array "just because". However, this way to write code is such a hack that I'd prefer a fully different way to approach the problem...

(Note that it is all generated C code produced by PyPy, as opposed to hand-written code; any change is fine, including if it requires changing the way we access the data everywhere, as long as the "valid" C optimizations are not prevented.)

EDIT: replaced "char[]" with "int[]", which doesn't accept the double-quote syntax "hello". This is because I'm looking for a solution for any array type.

NOT RESOLVED: thanks everybody for your suggestions. It seems there is no clean way, so I have implemented the hackish solution: declaring the types k+1 times, once with a flexible array "int items[];" and the k other times with "int items[N];" for the various values of N that are needed. This requires some additional hacks: e.g. not using flexible arrays for MSVC (they work differently there; I didn't investigate to know if exactly the same syntax would work); and GCC follows what C99 says and is not happy with structs that would contain int items[]; as only field. It is however happy if we add a dummy field char _dummy[0];... which is not strictly C99 as far as I know...

Armin Rigo
  • 12,048
  • 37
  • 48

4 Answers4

2

It's hackish, but could this work?

#include <stdio.h>

typedef struct {
    int length;
    int items[];     /* 1 */
} wchararray_t;

typedef struct {
    long hash;
    wchararray_t chars;   /* 2 */
    int dummy[]; /* hack here */
} string_t;

static string_t s1 = {
    617862378, { 5 },
    { 'H', 'e', 'l', 'l', 'o' }  /* 3: changed assignment */
};

int main(void)
{
    int i;
    for (i=0; i < 5; ++i) {
        putchar(s1.chars.items[i]);
    }
    putchar('\n');
    return 0;
}

GCC gives me warnings:

xx.c:10:22: warning: invalid use of structure with flexible array member [-pedantic]
xx.c:16:9: warning: initialization of a flexible array member [-pedantic]
xx.c:16:9: warning: (near initialization for ‘s1.dummy’) [-pedantic]

But it seems to work.

Reference

Edit: How about adding a "padding member" that makes sure items[] is always properly aligned?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

/* change to the strictest alignment type */
typedef long aligner;

typedef struct {
    long stuff;   /* to show misalignment on 64-bit */
    int length;
    aligner padding;
    int items[];
} chararray_t;

typedef struct {
    long hash;
    chararray_t chars;
    int dummy[];
} string_t;

static string_t b1 = {
    617862378,
    { 42, 5 },
    {-1, -2, -3, -4, -5}
};

int main(void)
{
    int i;

    printf("sizeof chararray_t: %zu\n", sizeof(chararray_t));
    printf("offsetof items: %zu\n", offsetof(chararray_t, items));

    printf("sizeof string_t: %zu\n", sizeof(string_t));
    printf("offsetof dummy: %zu\n", offsetof(string_t, dummy));

    for (i=0; i < 5; ++i) {
        printf("%d ", b1.chars.items[i]);
    }
    putchar('\n');
    for (i=0; i < 5; ++i) {
        printf("%d ", b1.dummy[i]);
    }
    putchar('\n');
    return 0;
}

When I run the above, I seem to get the correct answer:

sizeof chararray_t: 24
offsetof items: 24
sizeof string_t: 32
offsetof dummy: 32
-1 -2 -3 -4 -5 
-1 -2 -3 -4 -5 
Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
  • No, I thought about this actually, but it doesn't work for alignment issues. You don't know that the dummy array starts right after the "int length". It wouldn't if wchararray_t contained also a field of type "long" on 64-bit machines: sizeof(wchararray_t)==16 rather than 12 for alignment reasons, but you want the array to start after 12 bytes already (as it would naturally if it was declared as `int items[actual_size]`). – Armin Rigo Apr 15 '13 at 19:00
  • @ArminRigo: the array `items` wouldn't start after 12 bytes due to alignment issues, yes. But then if it did, `dummy` and `items` wouldn't "match up", so don't you *want* the padding to make the array `items` start at byte 16 in this case anyway? – Alok Singhal Apr 15 '13 at 19:53
  • No, from the original code, if you make the type `struct { int length; int items[5]; }` then the 5 * 4 bytes of `items` start after the 4 bytes of `length`. If you inline this struct inside string_t then `items` start at byte 12. – Armin Rigo Apr 15 '13 at 20:13
  • @ArminRigo, in that case, could you generate code to add a padding `char` array in between? Something like `typedef struct { long stuff; int length; char padding[4]; int items[]; } chararray_t;`. Of course, the size of `padding` (and whether it's there or not) will depend on the alignment requirements. – Alok Singhal Apr 16 '13 at 00:07
  • The size of `padding` depends on what other structure it's inlined into. It can be inlining into different kinds of structures. Not to mention that it's hard to know beforehand the exact alignments details that some given C compiler will produce on some platform. – Armin Rigo Apr 16 '13 at 08:24
  • @ArminRigo, I edited my response to make it independent of padding. Can you check to see if that works? – Alok Singhal Apr 16 '13 at 21:35
  • @ArminRigo, did you find a solution? If yes, I would be interested in knowing what it was. – Alok Singhal Apr 19 '13 at 05:25
  • I actually have implemented the hackish solution I gave already in my question: declaring the types k+1 times, once with "int items[];" and the k other times with "int items[N];" for the various values of N that are needed. – Armin Rigo Apr 19 '13 at 14:31
  • @ArminRigo, OK, thanks. Did you see my edit (`aligner` stuff)? I am curious whether that would have worked too. – Alok Singhal Apr 19 '13 at 14:49
  • It's the answer with the most interesting alternative imho. I could also have marked nothing as accepted (as per the _NOT RESOLVED_ I added to the question). The aligner stuff would probably work. It is not a full solution in my case, because again it changes the underlying data layout. – Armin Rigo Apr 20 '13 at 08:06
2

Answering my own question to write it down. Yet another hack would be to build on top of Alok's suggestion, which may give an occasionally bogus alignment --- and then fix the alignment by init-time code. This assumes that the big majority of such types used in a program happen to be correctly aligned. Code:

typedef struct {
    long stuff;   /* to show misalignment on 64-bit */
    int length;
    int items[];
} chararray_t;

typedef struct {
    long hash;
    chararray_t chars;
    int dummy[];
} string_t;


static string_t b1 = {
    617862378,
    { 42, 5 },
    {-1, -2, -3, -4, -5}
};
/* same with b2 .. b6 */

void fixme(void) {
    /* often compares as equal, and the whole function is removed */
    if (offsetof(string_t, dummy) !=
            offsetof(string_t, chars) + offsetof(chararray_t, items)) {
        static string_t *p_array[] = { &b1, &b2, &b3, &b4, &b5, &b6 };
        string_t *p;
        int i;
        for (i=0; i<6; i++) {
            p = p_array[i];
            memmove(p->chars.items, p->dummy, p->chars.length * sizeof(int));
        }
    }
}
Armin Rigo
  • 12,048
  • 37
  • 48
  • In the cases you'd need to run this code, is there any real advantage to this method vs. just using runtime initialization? – Alex Gaynor Apr 16 '13 at 04:10
  • If the main point is just static initialization cannot you replace it by generating an `unsigned char static_data[]={...};` and then using `(*((string_t *)(static_data+12345)))` in generated code instead of `s1`? – 6502 Apr 16 '13 at 06:05
  • We cannot use only an `unsigned char` because some fields need to be initialized with pointers to other static data; so it's basically equivalent to declaring the static variables as `string_len5_t` and casting them to `string_t`. But this generates GCC warnings with `-Wstrict-aliasing`, even if probably not bad code, if we systematically do the cast to access the structures. – Armin Rigo Apr 16 '13 at 08:30
  • @AlexGaynor: yes, the first advantage is that the functions are most of the time empty, because most of the time the run-time padding is not needed. But strictly speaking only the C compiler knows when it is needed or not; if it is not needed then the static declaration is already correct. The other advantage is that it takes far less space to initialize data than write code that fills the array one item at a time. – Armin Rigo Apr 16 '13 at 08:45
  • @ArminRigo The closest thing I got while trying the unsigned char array was using offsets instead of pointers and using a macro `OBJ` for example `&OBJ(Node,n->child_offset[i])` instead of `n->child[i]` (this was for a tree node with a variable number of children per node stored as a dynamic array at the end of the node object). The generated machine code included however an extra `addq $data, %rdi` to recreate the pointer from the offset. – 6502 Apr 16 '13 at 13:26
1
#include <stdio.h>
typedef struct {
    int length;
    char items[];     /* 1 */
} chararray_t;

typedef struct {
    long hash;
    chararray_t chars;   /* 2 */
} string_t;

/*static string_t s1 = {
    617862378,
    { 5, { 'H', 'e', 'l', 'l', 'o' } }  // 3
};*/

static string_t s1 =
{
    617862378,
    {6,"Hello"} /* 3 */
};

int main()
{
    printf("%d %d %s\n",s1.hash,s1.chars.length,s1.chars.items);
    return 0;
}

Add 1 for the null character, et voila! :)

Edit, Also works for 2 levels of nesting (GCC 4.8.0)

#include <stdio.h>
typedef struct {
    int length;
    char items[];     /* 1 */
} chararray_t;

typedef struct {
    long hash;
    chararray_t chars;   /* 2 */
} string_t;

typedef struct {
    long number;
    string_t arr;
}experiment_t;

static experiment_t s1 =
{
    617862378,
    {786,{6,"Hello"}} /* 3 */
};

int main()
{
    printf("%d %d %d %s\n",s1.number,s1.arr.hash,s1.arr.chars.length,s1.arr.chars.items);
    return 0;
}

----------EDIT 2------------------ Found a way around the limitation C initialize array within structure

Final code::

#include <stdio.h>
typedef struct {
    int length;
    int *items;     /* 1 */
} intarray_t;

typedef struct {
    long hash;
    intarray_t chars;   /* 2 */
    int dummy[2];
} string_t;

/*string_t s1 =
{
    617862378,
    {
        6,
        {1,2,3,4,5,6}
    },
    {
        0,0
    }
};*/

string_t s1 = {617862378,{},{0,0}};

int main()
{
    int i=0;
    intarray_t  t1 = {.length = 6, .items = (int[6]){1,2,3,4,5,6}};
    s1.chars = t1;
    printf("%d %d\n",s1.hash,s1.chars.length);
    while(i<s1.chars.length)
    {
        printf("%d",s1.chars.items[i]);
        i++;
    }
    putchar('\n');
    return 0;
}
Community
  • 1
  • 1
  • This compiles on gcc (4.7.2), however it gives the same "initialization of flexible array member is not allowed" error message on Clang (425.0.27). – Alex Gaynor Apr 15 '13 at 14:38
  • we keep string length, among other things so we don't have to null-terminate them. – fijal Apr 15 '13 at 14:46
  • We don't null terminate them, but when we write "abc", I believe that C already adds the '\0' character, but you are right. We can subtract the +1 from the length :) – Binayaka Chakraborty Apr 15 '13 at 14:50
  • also, it makes it work for one level of nesting. Can you make it two levels of nesting? Or can you make it work if it's int[] instead of char[]? – fijal Apr 15 '13 at 14:55
  • See above, it works for two levels of nesting also. I know that : You can't use structures containing a flexible array member in an array (of the structure)(See C99 standard §6.7.2.1/2:), but apparently, GCC supports it if the unknown type is at the end if the structure declaration. int[] doesn't work :( – Binayaka Chakraborty Apr 15 '13 at 15:35
  • Uh, so it works for arrays of chars, provided we use the syntax with double quotes. It doesn't at all for other array types, like arrays of ints. Sorry, not a good enough answer then... – Armin Rigo Apr 15 '13 at 17:30
  • @ArminRigo: Check this out [http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fstrct.htm] Apparently, IBM's extension allows you to do what needs to be done :) – Binayaka Chakraborty Apr 15 '13 at 18:19
  • @ArminRigo: Please test the latest edit and tell me if it works on your end :) – Binayaka Chakraborty Apr 15 '13 at 18:43
  • You replaced `int[]` with `int *`, thus introducing an indirection which changes the problem. – Armin Rigo Apr 15 '13 at 20:15
0

I assume there is some reason for keeping the string "inside" the struct and that you want to save a char, by not initializing with a C-string.

But, if not, you could do:

typedef struct {
    int length;
    char *items;     /* 1 */
} chararray_t;

typedef struct {
    long hash;
    chararray_t chars;   /* 2 */
} string_t;

static string_t s1 = {
    617862378,
    { 5, "Hell" }  /* 3 */
}; 
s1.chars.items[4] = 'o' ;

Looks like you can do the union trick, but with a typecast instead ?

#include <stdio.h>

typedef struct { int length; int items[]; } wchararray_t;
typedef struct { long hash; wchararray_t chars; } string_t;

typedef struct { int length; int items[5]; } wchararray_len5_t;
typedef struct { long hash; wchararray_len5_t chars; } string_len5_t;

static union { string_len5_t a; string_t b; } s5 = {
    617862378,
    { 5, { 'H', 'e', 'l', 'l', 'o' } }
};

string_t *s1 = (string_t*) &s5 ;

int main( int argc, char *argv[])
{

  for( int i = 0 ; i < s1->chars.length ; i++ )
    {
      printf ( "%c", s1->chars.items[i] );
    }
  printf( "\n" );
}
renejsum
  • 69
  • 7
  • By keeping the `char[]` inside the `struct` you don't save one `char`, you save a memory indirection. – Alex Gaynor Apr 15 '13 at 14:40
  • Should have been more clear: 'H','e','l','l,'o' saves a char AND allocates in struct. But, since the string is statically allocated, will the compiler not optimise the indirection out ? – renejsum Apr 15 '13 at 14:49
  • No, the compiler cannot optimize indirections in data structures. The goal here is to have a precise layout in memory (e.g. respecting what is expected from PyPy's GCs) --- but at the same time have a static variable following the same layout. – Armin Rigo Apr 15 '13 at 17:33
  • Just noticed that @ArminRigo changed char to int in the example, it still works, but the two struct need to both use int, to have the same layout – renejsum Apr 16 '13 at 04:11
  • The other solution you added looks interesting (with `string_t *const s1` to let the compiler remove an indirection). I didn't realize it but indeed, we can do "crazy" casts for static data, even if we can't do it in code (writing `#define s1 ((string_t)&s5)` instead would generate `warning: dereferencing type-punned pointer will break strict-aliasing rules`). – Armin Rigo Apr 16 '13 at 08:42
  • Basically you can typecast any pointer to any memory area, it could just be char dummy[200]; that you point to. Maybe the example should be (*s1).chars.items[i], not sure ? – renejsum Apr 16 '13 at 09:36