2

I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this:

struct myStruct{
    int data;
    int *array;
};

and later on when we allocate memory for a struct myStruct using malloc in our main() function, we can simultaneously allocate memory for our int *array pointer in same step, like this:

struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));

p->array = p+1;

instead of

struct myStruct *p = malloc(sizeof(struct myStruct));

p->array = malloc(100 * sizeof(int));

assuming we want an array of size 100.

The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case.

Experimenting, I wrote this:

#include<stdio.h>
#include<stdlib.h>

struct myStruct{
    int i;
    int *array;
};

int main(){
    /* I ask for only 40 more bytes (10 * sizeof(int)) */

    struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int)); 

    p->array = p+1; 

    /* I assign values way beyond the initial allocation*/
    for (int i = 0; i < 804; i++){
        p->array[i] = i;
    }

    /* printing*/
    for (int i = 0; i < 804; i++){
        printf("%d\n",p->array[i]);
    }

    return 0;
}

I am able to execute it without problems, without any segmentation faults. Looks weird to me.

I also came to know that C99 has a provision which says that instead of declaring an int *array inside a struct, we can do int array[] and I did this, using malloc() only for the struct, like

struct myStruct *p = malloc(sizeof(struct myStruct));

and initialising array[] like this

p->array[10] = 0; /* I hope this sets the array size to 10 
                    and also initialises array entries to 0 */

But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries:

for(int i = 0; i < 296; i++){ // first loop
    p->array[i] = i;
}

for(int i = 0; i < 296; i++){ // second loop
    printf("%d\n",p->array[i]);
}

After printing p->array[i] till i = 296 it gives me a segmentation fault, but clearly it had no problems assigning beyond i = 9. (If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.)

Any clues about what's happening? Is it undefined behaviour or what?

EDIT: When I compiled the first snippet with the command

cc -Wall -g -std=c11 -O    struct3.c   -o struct3

I got this warning:

 warning: incompatible pointer types assigning to 'int *' from
  'struct str *' [-Wincompatible-pointer-types]
    p->array = p+1;
tf3
  • 447
  • 1
  • 4
  • 16
  • 1
    You still need to allocate memory for the integers. – Bjorn A. Nov 06 '16 at 18:30
  • 2
    There is no _variable length array_ in your code. What you do is called _flxeible array member_ (FAM). How do you think `malloc` shall know how many elements you want this array to hold? – too honest for this site Nov 06 '16 at 18:30
  • And the first snippet invokes undefined behaviour. If your compiler is not already crying, enable warnings. If it still does not, it is rubbish and you should get a modern one. Read about undefined behaviour. C does not prevent you from shooting your foot, knee and head. – too honest for this site Nov 06 '16 at 18:32
  • @Olaf So what's happening over here? In the first case I malloc'd explicitly 10*sizeof(ints) and then pointed int *array to p+1, how else am I to tell malloc that I want 10 integers worth of space? – tf3 Nov 06 '16 at 18:34
  • @Olaf I updated to XCode 8 a few days ago and I used this to compile: cc -Wall -g -std=c11 -O struct2.c -o struct2 – tf3 Nov 06 '16 at 18:36
  • Another way would be to have the `struct` member as `int array[1];` and then do a single memory allocation for the `struct`. As commented at the top, your pointer member is not initialised. – Weather Vane Nov 06 '16 at 18:36
  • Then **don't** ignore warnings! – too honest for this site Nov 06 '16 at 18:37
  • @WeatherVane: That will 1) allocate a single element and 2) invoke UB, too if you dereference higher indexes. gcc has an extension for zero-size arrays, but that is superseeded by FAMs and should not be used in new code. – too honest for this site Nov 06 '16 at 18:38
  • @Olaf Ok the only warning I got was this when I compiled the first snippet: incompatible pointer types assigning to 'int *' from 'struct str *' [-Wincompatible-pointer-types] p->array = p+1; – tf3 Nov 06 '16 at 18:39
  • @Olaf not if the `struct`'s memory comes from `malloc`. I placed an array, instead of a pointer. – Weather Vane Nov 06 '16 at 18:39
  • 1
    I will not give you how to do, because you can easily determine if you think about it a bit and don't concentrate on asking. Just that: You have all necessary information shown in your question already. – too honest for this site Nov 06 '16 at 18:40
  • And you decided to ignore the warning apparently. – too honest for this site Nov 06 '16 at 18:40
  • @tectonicfury You should [edit] your post and add the warning to it. This is very important information. – anatolyg Nov 06 '16 at 18:41
  • 1
    @WeatherVane: The array has a length of 1 entry. dereferencing beyond the borders is definitively UB. – too honest for this site Nov 06 '16 at 18:42
  • @Olaf C does not care how far out of range you index an array, so long as you own the memory. – Weather Vane Nov 06 '16 at 18:43
  • @anatolyg added the warning – tf3 Nov 06 '16 at 18:45
  • @WeatherVane: The C standard disagrees with you. 6.5.6p8: "... If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated." Annex J clarifies causes for UB: "Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object". A compiler very well can (and modern compilers like gcc will) take advantage of UB. Said that: it does not change the problem. The comments are just confusing to a beginner. – too honest for this site Nov 06 '16 at 18:49
  • @Olaf that is about pointers. My suggestion does not use a pointer. – Weather Vane Nov 06 '16 at 18:52
  • @WeatherVane: You really should (and I'm pretty sure you do) know that arrays are implicitly converted to pointer except for three operators (`&`, `sizeof`, `_Alignof`)! For **all other operators (and function parameters), it is first converted to a pointer to the first element. – too honest for this site Nov 06 '16 at 18:54
  • @Olaf and the pointer to the first element is perfectly legal. Once the array-converted-to-pointer reaches a function, nothing is known about its bounds. – Weather Vane Nov 06 '16 at 18:55
  • @WeatherVane: You forgot about the index which is **added**! That's exactly what 6.5.6 is about! – too honest for this site Nov 06 '16 at 18:56
  • @Olaf This is only for information, I got rid of the warning by casting `p->array = (int *) (p + 1);` and the result is the same so apparently the warning had nothing to do with the oddity. Thanks for comments. – tf3 Nov 06 '16 at 19:02
  • 1
    @tectonicfury: Never ever cast just to silence the compiler if you don't understand all implications. If you think the UB will magically disapper by the cast: well, you are wrong! – too honest for this site Nov 06 '16 at 19:07
  • @Olaf I got it, C may allow me to assign values beyond the array. I thought that I would get a SegFault immediately. This gave me deeper understanding that C really leaves it to the programmer to make sure that the code is correct. – tf3 Nov 06 '16 at 19:15
  • @tectonicfury: Well, you teacher and/or C book should hae told you this in a veryx early lesson! Thats one reason learning C by trial&error is a **very** bad idea! – too honest for this site Nov 06 '16 at 19:17
  • @WeatherVane using the array's name decays to a pointer, so the pointer arithmetic rules apply. `x[y]` means `*(x+y)`.. You have no idea what you are talking about in this comment discussion – M.M Nov 06 '16 at 20:56
  • @M.M so what is wrong with the defined array of length 1 decaying to a pointer? – Weather Vane Nov 06 '16 at 20:58
  • Nothing, but then you are suggesting going on to access outside the bounds of the array via the pointer (which is undefined behaviour) – M.M Nov 06 '16 at 20:59
  • @M.M if memory was allocated why is that any diffferent from indexing a pointer directly? – Weather Vane Nov 06 '16 at 20:59
  • It's not different from indexing a pointer directly. `p[2]` and `p+2` both cause undefined behaviour. – M.M Nov 06 '16 at 21:00
  • See http://stackoverflow.com/questions/3711233/is-the-struct-hack-technically-undefined-behavior – M.M Nov 06 '16 at 21:02
  • @M.M thanks for the useful link, which does not say it is illegal. I was about to say: the `struct` was assigned enough memory for the purposes. The array is passed to a function. It decays to a pointer. You index the pointer in the function. Why is that any different from passing a pointer instead of an array? If you pass the address of the first array element to the function instead, would that still be suspect? The rules for pointers seem to be formulated for the case of an array defined in the straighforward way, where overrunning it *is* undefined behaviour. – Weather Vane Nov 06 '16 at 21:09

2 Answers2

3

Yes, what you see here is an example of undefined behavior.

Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (e.g. "Segmentation fault").

A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (e.g. padding). In any case, however, buffer overflows invoke undefined behavior.

The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. However, it's still undefined behavior, so not recommended to use. In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work.

Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). In C, when assigning to an element, it should already be allocated, even if the array is "flexible". Your code should know how much to allocate when it allocates memory. If you don't know how much to allocate, use one of the following techniques:

  • Allocate an upper bound: struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers };
  • Use realloc
  • Use a linked list (or any other sophisticated data structure)
anatolyg
  • 26,506
  • 9
  • 60
  • 134
  • Thanks. The warning which I got upon compiling the first snippet just vanished when I cast `p->array = (int *)(p + 1);` so apparently its more due to UB than the warning. – tf3 Nov 06 '16 at 19:07
  • 2
    Warnings never *cause* problems; they *indicate* that there are problems. Using a cast is a good way to silence a warning, when you "know what you are doing". If you have a C99 compiler, better use the syntax that doesn't require a casting - it's the safer code; also less ugly. – anatolyg Nov 06 '16 at 19:13
  • To detail: A modern standard (currently 2011, aka C11) compliant compiler will also do. No need to use the old 1999 version (aka C99) of the standard. – too honest for this site Nov 06 '16 at 19:19
0

What you describe as a "Struct Hack" is indeed a hack. It is not worth IMO.

p->array = p+1;

will give you problems on many compilers which will demand explicit conversion:

p->array = (int *) (p+1);

I am able to execute it without problems, without any segmentation faults. Looks weird to me.

It is undefined behaviour. You are accessing memory on the heap and many compilers and operating system will not prevent you to do so. But it extremely bad practice to use it.

sg7
  • 6,108
  • 2
  • 32
  • 40
  • The issue was that I tried the other alternative (that I was aware of ) which did not involve pointer but used flexible array member. I was not very eager to use the hack but because the second alternative which I came across wasn't very helpful either, I was intrigued. – tf3 Nov 06 '16 at 19:37