23

Is this how one can use the the "extra" memory allocated while using the C struct hack?

Questions:

I have a C struct hack implementation below. My question is how can I use the "extra" memory that I have allocated with the hack. Can someone please give me an example on using that extra memory ?

#include<stdio.h>
#include<stdlib.h>

int main()
{

    struct mystruct {

        int len;
        char chararray[1];

    };

    struct mystruct *ptr = malloc(sizeof(struct mystruct) + 10 - 1);
    ptr->len=10;


    ptr->chararray[0] = 'a';
    ptr->chararray[1] = 'b';
    ptr->chararray[2] = 'c';
    ptr->chararray[3] = 'd';
    ptr->chararray[4] = 'e';
    ptr->chararray[5] = 'f';
    ptr->chararray[6] = 'g';
    ptr->chararray[7] = 'h';
    ptr->chararray[8] = 'i';
    ptr->chararray[9] = 'j';


}
Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208
  • 2
    Can you give us a little more context? What is the c struct hack? What is your actual question? – Robert Harvey May 14 '13 at 22:00
  • 1
    @RobertHarvey I believe extending the array to 9 elements from 1 element. – Lews Therin May 14 '13 at 22:01
  • @RobertHarvey Whatever Lews said. Extending the array to 10 from 1. – Ankur Agarwal May 14 '13 at 22:02
  • 1
    @RobertHarvey Faking a flexible array member by declaring an array of length 1 at the end. Before flexible array members were added in C99, that was _the_ way to have structs with arrays of varying (and runtime-determined) length. (Undefined behaviour, though.) But I suspect yours was a rhetorical question. – Daniel Fischer May 14 '13 at 22:03
  • @DanielFischer: Somewhat rhetorical. I genuinely didn't know. It looks wildly unsafe, although I suppose it works if you're careful with it. Could say that about all of C, though. – Robert Harvey May 14 '13 at 22:07
  • 1
    @RobertHarvey its scary when you first see it, but in practice it works quite well. – Keith Nicholas May 14 '13 at 22:09
  • 3
    @RobertHarvey It was so widely used that no compiler could afford breaking it. But it always was explicitly undefined behaviour according to the standard [well, obviously not before the first standard]. Now we have flexible array members standardized, so the struct hack should follow the dodo. – Daniel Fischer May 14 '13 at 22:10
  • Question 2.6 of the [comp.lang.c FAQ](http://www.c-faq.com/) discusses the struct hack. It's a bit pre-C99-centric, mentioning flexible array members only as an afterthought. – Keith Thompson May 14 '13 at 22:54
  • I was about to say that there's one major C compiler that doesn't support C99 (and therefore didn't support flexible array members). But even though it doesn't support C99 in general, I was surprised to find that MSVC has supported flexible array members since at least VC6. – Michael Burr May 15 '13 at 05:42
  • 1
    If you can use malloc, then the member `charrarray` could be a "pointer to char", initialized for some `malloc()` sentence. I don't see the point in using this "hack", even in pre-C99 code. – pablo1977 Jul 07 '14 at 10:42
  • @pablo1977: I suppose it saves the amount of malloc/free calls from 2 to 1..? – Hubro Mar 22 '15 at 07:08

4 Answers4

18

Yes, that is (and was) the standard way in C to create and process a variably-sized struct.

That example is a bit verbose. Most programmers would handle it more deftly:

struct mystruct {
        int len;
        char chararray[1];  // some compilers would allow [0] here
    };
    char *msg = "abcdefghi";
    int n = strlen (msg);

    struct mystruct *ptr = malloc(sizeof(struct mystruct) + n + 1);

    ptr->len = n;
    strcpy (ptr->chararray, msg);
}
wallyk
  • 56,922
  • 16
  • 83
  • 148
  • But why would anyone ever do that? Should not a struct hold pointers to variable memory? – djechlin May 14 '13 at 22:04
  • 5
    It **was**. The standard way now is to use a flexible array member. – Daniel Fischer May 14 '13 at 22:05
  • @DanielFischer I'm not sure why I care.. but what's a flexible array member? Are you talking realloc? – Lews Therin May 14 '13 at 22:08
  • 1
    @LewsTherin see http://stackoverflow.com/questions/246977/flexible-array-members-in-c-bad – Alok Singhal May 14 '13 at 22:11
  • 2
    @LewsTherin `struct foo { int length; other_type other_member; double flexible_array_member[]; };`. – Daniel Fischer May 14 '13 at 22:12
  • 1
    @wallyk And strcpy works here because we allocated extra memory in malloc which can be safely pointed to by chararray without the danger of exceeding the bounds of an array? Right ? – Ankur Agarwal May 14 '13 at 22:12
  • 1
    @abc: That is correct. However, this is also frequently a potential source of bugs when someone half-aware changes part of it without recognizing the dependency of the other part. – wallyk May 14 '13 at 22:14
  • 1
    @abc: Because `strlen` returns a `size_t`, and the C standard does not guarantee that an `int` can hold the returned value. – Eric Postpischil May 15 '13 at 13:26
  • This code allocates n+1 bytes beyond mystruct. However, the size of mystruct already includes one byte for the array, so it is only necessary to allocate n more bytes to provide space for n non-null characters in the string plus one null character. – Eric Postpischil May 15 '13 at 13:27
  • @EricPostpischil: Actually, I would normally write the struct with the last element's array dimension as `[0]`, but that causes a compilation error on some common compilers–specifically `MSC`. "Wasting" one byte is a very small price to pay so that it always works safely in spite of the compiler used. – wallyk May 15 '13 at 16:44
  • @abc - More generally, `size_t` is designed for array indices and object sizes. It is the type of the argument for `malloc` and related functions, and the type returned by the `sizeof` operator. On some platforms, `int` can be 32-bits while pointers (and therefore almost always `size_t`) can be 64-bits. Even if you used `unsigned int` you're still not guaranteed the right size integer. It's like saying `char buffer[1000]; gets(buffer);` is good enough. – Chris Lutz May 15 '13 at 19:36
  • @ChrisLutz In that case only the int in `int n = strlen (msg);` should be replaced and not the one in struct definition..right? – Ankur Agarwal May 16 '13 at 15:27
  • 1
    @abc - Nope, all of them. The `len` in the struct definition is an array index or object size (or at least will be used that way), therefore it is a `size_t`. – Chris Lutz May 17 '13 at 19:56
  • Calling this "standard" is an unfortunate choice of word since, per _the_ Standard, it's undefined behaviour. http://stackoverflow.com/q/3711233/2757035 Just because a lot of people use it doesn't mean it was ever portable. Anyway, why people still fret about this hack when C received a defined way to achieve the same thing (flexible array members) in C99 is beyond me... I remain incredulous at the glacial pace of large swathes of the C community and the lengths they go to to avoid ever moving forward. – underscore_d Jul 01 '16 at 23:54
  • @underscore_d: When you have a code base of 50 million lines of code, much of it written in the 1980s, you can't replace the compiler willy nilly. – wallyk Jul 02 '16 at 05:39
5

Ever since I read this article (http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx), I've liked to use the struct hack like so:

  #include<stdio.h>
  #include<stdlib.h>

  int main()
  {

      struct mystruct {

          int len;
          char chararray[1];
      };

      int number_of_elements = 10;

      struct mystruct *ptr = malloc(offsetof(struct mystruct, chararray[number_of_elements]));
      ptr->len = number_of_elements;

      for (i = 0; i < number_of_elements; ++i) {
        ptr->chararray[i] = 'a' + i;
      }

  }

I find that not having to remember whether 1 needs to be subtracted (or added or whatever) is nice. This also has the bonus of working in situations where 0 is used in the array definition, which not all compilers support but some do. If the allocation is based on offsetof() you don't need to worry about that possible detail making your math wrong.

It also works without change is the struct is a C99 flexible array member.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    It is hard to look past the statement that C99 allows 0-sized arrays in this blog post though (C99 has the crucial word “nonempty” in 6.2.5:20). It is also standardized that the size of an array of size n is n times the size of the element, and the standard implicitly assumes that the size of an object is never zero in a couple of places (though GCC took the risk to allow zero-sized arrays as an extension. I don't understand how they accepted that risk, but perhaps if it's only ever at the end of a struct, it's okay) – Pascal Cuoq May 14 '13 at 22:36
  • I think the mention in the article about zero length arrays not being legal until C99 is a somewhat imprecise combination of making clear that it's not legal in C90 and that C99 permits incomplete arrays types at the end of a struct (which is not too much of stretch from zero-sized arrays even if it's not precisely that). – Michael Burr May 14 '13 at 22:51
1

I would advise against that due to possible alignment issues instead consider this:

struct my_struct
{
    char *arr_space;
    unsigned int len;
}

struct my_struct *ptr = malloc(sizeof(struct my_struct) + 10);
ptr->arr_space = ptr + 1;
ptr->len = 10;

This will give you locality as well as safety :) and avoid weird alignment issues.

By alignment issues I meant possible access delays for accessing unaligned memory.

In the original example if you add a byte or non word aligned member (byte, char, short) then the compiler may extend the size of the structure but as far as your pointer is concerned you are reading the memory directly after the end of the struct (non aligned). This means if you have an array of an aligned type such as int every access will net you a performance hit on CPUs that take hits from reading unaligned memory.

struct
{
    byte_size data;
    char *var_len;
    some_align added by compiler;
}

In the original case you will be reading from the some_align region which is just filler but in my case you will read from aligned extra memory afterwards (which wastes some space but that's typically okay).

Another benefit of doing this is that it's possible to get more locality from allocations by allocating all the space for variable length members of a struct in one allocation rather than allocating them separately (avoids multiple allocation call overheads and gives you some cache locality rather than bouncing all over memory).

Jesus Ramos
  • 22,940
  • 10
  • 58
  • 88
  • 5
    `char` arrays do not have any alignment requirements -- they can be aligned on any boundary. – Adam Rosenfield May 14 '13 at 22:03
  • 1
    @AdamRosenfield I meant for other fields in the struct the size of the struct might be aligned and you will take a hit when accessing unaligned memory in some CPUs – Jesus Ramos May 14 '13 at 22:04
  • 1
    +1 for alignment issues ... Adam, I think that Jesus's point is that *arr_space can be any-typed. – Ahmed Masud May 14 '13 at 22:04
  • 1
    Could you give an example of alignment problems? I can't think of any here. Your technique is common when a struct is returned by some POSIX APIs and there are two strings stored past the end of the struct, in which case you need two pointers. See getpwuid_r, etc. – Nicholas Wilson May 14 '13 at 22:04
  • The variable sized element is at the end of the struct so it has no affect on the alignment of other members. – Paul R May 14 '13 at 22:04
  • @PaulR Only if there is one variable sized element. This can be extended for multiple ones that will give you cache locality. – Jesus Ramos May 14 '13 at 22:05
  • @Ahmed it's only an alignment issue if he can give us one! In the example the OP gave, there don't seem to be alignment problems. – Nicholas Wilson May 14 '13 at 22:06
  • I still don't see any potential alignment issues - without a concrete example I don't buy this argument. – Paul R May 14 '13 at 22:06
  • 1
    of course there is an alignment issue, if the order of the struct is as OP has then `int len` is the first element of the struct, and if it's misaligned then the code will intermittently cause problems; the fact that Jesus put the char * FIRST makes the alignment issue go away . – Ahmed Masud May 14 '13 at 22:09
  • 3
    @AhmedMasud : How can `int len;` ever be misaligned when it's *before* the variable sized section? – Roddy May 14 '13 at 22:10
  • @PaulR Is the one I added convincing enough? – Jesus Ramos May 14 '13 at 22:12
  • @NicholasWilson Is this example good enough? This is really only for CPUs that take hits from accessing unaligned memory. – Jesus Ramos May 14 '13 at 22:13
  • 4
    @Jesus: sorry, no: the last element in the struct is an array of size 1. It is already correctly aligned. If you allocate space for additional elements then these elements too will be correctly aligned. – Paul R May 14 '13 at 22:13
  • @PaulR This places the restriction that your variable length type must always be at the end which isn't always possible if you need more than one. – Jesus Ramos May 14 '13 at 22:14
  • Unlike your proposal, the "hack" structure can be overlayed on an existing buffer (such as a message packet) to impose a structure on it. So while this is better in those cases where it will work, the "hack" still has its uses. – Clifford May 14 '13 at 22:15
  • @Roddy I was saying for examples where you may have multiple variable length member. – Jesus Ramos May 14 '13 at 22:16
  • 1
    @Jesus: well this is the canonical form for this kind of hack - the whole point is that the variable-sized element is at the end of the struct, and the OP has posted a typical example of this. You're moving the goalposts by suggesting that the variable-sized element might not be at the end of the struct, or that there might be more than one variable-size element - that's not what the OP was asking about. – Paul R May 14 '13 at 22:16
  • 4
    @JesusRamos - but that's *still* not an aligment issue. That's just a different problem. – Roddy May 14 '13 at 22:17
  • @Roddy I guess I am more used to doing this for increasing cache locality of variable length struct members (to allocate them all at once to avoid multiple calls) so I tend to use this technique. – Jesus Ramos May 14 '13 at 22:19
  • 4
    @JesusRamos: A flexible array member can *only* be the last member of a struct; likewise for the older "struct hack". – Keith Thompson May 14 '13 at 22:52
1

It is 'correct', but you'd need a good reason to do that over a more reasonable solution. More commonly perhaps you'd use this technique to "overlay" some existing array to impose some sort of header structure on to it.

Note that GCC by extension allows a zero length array member for exactly this purpose, while ISO C99 "legitimises" the practice by allowing a member with empty brackets (only as the last member).

Note that there are some semantic issues - sizeof the struct will not of course account for the "flexible" size of the final member, and passing the struct "by value" will only pass the header and first element (or no element using the GCC extension or C99 flexible array member). Similarly direct struct assignment will not copy all the data.

Clifford
  • 88,407
  • 13
  • 85
  • 165