24

Frankly, is such a code valid or does it produce UB?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct __attribute__((__packed__)) weird_struct
{
    int some;
    unsigned char value[1];
};

int main(void)
{
    unsigned char text[] = "Allie has a cat";
    struct weird_struct *ws =
        malloc(sizeof(struct weird_struct) + sizeof(text) - 1);
    ws->some = 5;
    strcpy(ws->value, text);
    printf("some = %d, value = %s\n", ws->some, ws->value);
    free(ws);
    return 0;
}

http://ideone.com/lpByQD

I’d never think it is valid to something like this, but it would seem that SystemV message queues do exactly that: see the man page.

So, if SysV msg queues can do that, perhaps I can do this too? I think I’d find this useful to send data over the network (hence the __attribute__((__packed__))).

Or, perhaps this is a specific guarantee of SysV msg queues and I shouldn’t do something like that elsewhere? Or, perhaps this technique can be employed, only I do it wrongly? I figured out I’d better ask.

This - 1 in malloc(sizeof(struct weird_struct) + sizeof(text) - 1) is because I take into account that one byte is allocated anyway thanks to unsigned char value[1] so I can subtract it from sizeof(text).

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • @IharobAlAsimi Because one byte is counted in `sizeof(struct weird_struct)` because `unsigned char value[1]` has size of one byte anyway? At least that is what I was figuring out. –  May 02 '17 at 11:23
  • @IharobAlAsimi, it does look like that is correct. Though I agree with your suggestion to use `strlen()` instead. – anonymoose May 02 '17 at 11:25
  • 1
    Accessing an array beyond its declared limits invokes undefined behaviour. It worked on most compilers, but with flexible array members compilers might handle this more strictly; don't rely on the above code to work. – too honest for this site May 02 '17 at 11:42
  • @Olaf Thus I’d suppose POSIX **must’ve** mandated this is not UB, or else using SysV msg queues, which are mentioned by POSIX, would be impossible without invoking UB? Is this correct? –  May 02 '17 at 11:45
  • @gaazkam: I did not check POSIX for that and your question is not tagged POSIX, but C only. And the C standard is clear about that. I don't see a reason one is requred to violate this rule for message queues and strongly doubt POSIX states something like that. – too honest for this site May 02 '17 at 11:53
  • 1
    Whether or not the original `arr[1]` struct hack provokes UB, when the structure is allocated with no declared type (e.g. via `malloc`), depends on the exact meaning of the word "object" in the C standard, and that has never been resolved to anyone's satisfaction; despite ongoing attempts to fix the wording, the text still contradicts itself, and there are still at least three interpretations that have a strong case for being the "intended" meaning; under two of those it is UB, but under the third, it isn't. – zwol May 02 '17 at 15:57
  • @zwol: One of the stated goals of the Standard was to avoid doing anything that would prevent implementations from supporting the existing corpus of C code. Unless the authors of the Standard were being disingenuous, they cannot have intended that implementations which might be called upon to run programs that exploited the struct hack should do anything other than process positive array subscripts on the last item of a structure using physical address arithmetic if the resulting pointer would land in the same allocation. Whether or not they intended to require such behavior... – supercat May 04 '17 at 21:54
  • ...on all implementations, there is no indication that they viewed code using such constructs as "defective". Such code may only work on implementations designed for e.g. low-level programming rather than those designed for high-end number crunching, but since there's no way all code is going to run usefully on all implementations anyway, the inability to use code on some specialized implementations shouldn't be seen as a big loss. – supercat May 04 '17 at 22:04
  • @supercat According to http://c-faq.com/struct/structhack.html , "Dennis Ritchie has called it ``unwarranted chumminess with the C implementation,'' and an official interpretation has deemed that it is not strictly conforming with the C Standard" –  May 06 '17 at 08:51

2 Answers2

20

The standard C way (since C99) to do this would be using flexible array member. The last member of the structure needs to be incomplete array type and you can allocate required amount of memory at runtime.

Something like

struct __attribute__((__packed__)) weird_struct
{
    int some;
    unsigned char value [ ];   //nothing, no 0, no 1, no nothing...
}; 

and later

struct weird_struct *ws =
    malloc(sizeof(struct weird_struct) + strlen("this to be copied") + 1);

or

struct weird_struct *ws =
    malloc(sizeof(struct weird_struct) + sizeof("this to be copied"));

will do the job.

Related, quoting the C11 standard, chapter §6.7.2.1

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.


Related to the one-element array usage, from online gcc manual page for zero-length array support option

struct line {
  int length;
  char contents[0];
};

struct line *thisline = (struct line *)
  malloc (sizeof (struct line) + this_length);
thisline->length = this_length;

In ISO C90, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc.

which also answers the -1 part in the malloc() argument, as sizeof(char) is guaranteed to be 1 in C.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
1

The Standard allows implementations to act in any way they see fit if code accesses an array object beyond its stated bounds, even if the code owns the storage that would be accessed thereby. So far as I can tell, this rule is intended to allow for a compiler given something like:

struct s1 { char arr[4]; char y; } *p;
int x;
...
p->y = 1;
p->arr[x] = 2;
return p->y;

to treat it as equivalent to:

struct s1 { char arr[4]; char y; } *p;
int x;
...
p->arr[x] = 2;
p->y = 1;
return 1;

avoiding an extra load step, without having to pessimistically allow for the possibility that x might equal 4. Quality compilers should be able to recognize certain constructs which involve accessing arrays beyond their stated bounds (e.g. those involving a pointer to a structure with a single-element array as its last element) and handle them sensibly, but nothing in the Standard would require that they do so, and some compiler writers take the attitude that permission for compilers to behave in nonsensical fashion should be interpreted as an invitation to do so. I think that behavior would be defined, even for the x==4 case (meaning the compiler would have to allow for the possibility of it modifying y), if the array write were handled via something like: (char*)(struct s1*)(p->arr)[x] = 2; but the Standard is not really clear on whether the cast to struct s1* is necessary.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • *The Standard allows implementations to act in any way they see fit* Another way of saying: undefined behavior. – 2501 May 03 '17 at 14:30
  • @2501: The Standard makes no effort to define all the behaviors that would make a compiler suitable for a particular purpose. The fact that the Standard *allows* implementations to act in nonsensical fashion in a particular situation does not mean that such behavior wouldn't make a compiler unsuitable for many purposes. Relatively few tasks can be performed by all conceivable conforming C implementations, and consequently few programs can be expected to behave usefully on all C implementations. Most programs can only be expected to run on implementations which are suitable... – supercat May 03 '17 at 15:22
  • ...to their needs. The fact that a particular program does not run usefully on a particular implementation doesn't imply that either is defective, but merely that the implementation is not suitable for use with the program. I'm not sure why some compiler writers interpret the phrase "non-portable or erroneous" in the definition of UB as simply "erroneous", but that seems to be the fashionable religion. – supercat May 03 '17 at 15:25
  • @2501: To put things another way, if 90% of implementations, including all implementations suitable for some purpose, specify the behavior of some particular action, then on those 90% of implementations the action will have defined behavior whether the Standard mandates it or not. The fact that the Standard doesn't define a behavior for some action doesn't mean that the behavior is undefined in all contexts. – supercat May 03 '17 at 18:47