5

I want to understand:

  • why it happens that sometimes a char[1] in C is used as char* (why doing this?) and
  • how the internals works (what's going on)

Giving following sample program:

#include <stdio.h>
#include <string.h>

struct test_struct {
    char *a;
    char b[1];
} __attribute__((packed)); ;

int main() {

    char *testp;
    struct test_struct test_s;

    testp = NULL;
    memset(&test_s, 0, sizeof(struct test_struct));

    printf("sizeof(test_struct) is: %lx\n", sizeof(struct test_struct));

    printf("testp at: %p\n", &testp);
    printf("testp is: %p\n", testp);

    printf("test_s.a at: %p\n", &test_s.a);
    printf("test_s.a is: %p\n", test_s.a);

    printf("test_s.b at: %p\n", &test_s.b);
    printf("test_s.b is: %p\n", test_s.b);

    printf("sizeof(test_s.b): %lx \n", sizeof(test_s.b));

    printf("real sizeof(test_s.b): %lx \n", ((void *)(&test_s.b) - (void *)(&test_s.a)) );

    return 0;
}

I get the following output (OS X, 64bit):

sizeof(test_struct) is: 9
testp at: 0x7fff62211a98
testp is: 0x0
test_s.a at: 0x7fff62211a88
test_s.a is: 0x0
test_s.b at: 0x7fff62211a90
test_s.b is: 0x7fff62211a90
sizeof(test_s.b): 1 
real sizeof(test_s.b): 8 

Looking at the memory addresses, one can see that even the struct is 9 bytes large, 16 bytes were allocated which seems to be caused by char b[1]. But I'm not sure if those extra bytes were allocated due to optimization/mem alignment reasons, or if this has to do with C's internal treatment of char arrays.

A real world example can be seen in <fts.h>:

`man 3 fts` shows the struct member `fts_name` as:

            char *fts_name;                 /* file name */

while /usr/include/fts.h defines the member as:

            char fts_name[1];               /* file name */

In the end, fts_name can really be used as a pointer to a C-string. For example, printing to stdout with printf("%s", ent->fts_name) works.

So if a char[1] is really one byte large, it couldn't be used as a memory pointer on my 64bit machine. On the other hand, treating this as a full blown char * doesn't work either, as can be seen with the test_s.b is output above, which should show a NULL pointer then...

grasbueschel
  • 879
  • 2
  • 8
  • 24
  • Main idea: I'm gonna disappoint you, but pointers are not arrays. –  Oct 12 '12 at 17:18
  • `real sizeof(test_s.b): 8` is **wrong**. Because `sizeof(char*)` is `8` and `sizeof(char)` is `1`. Thats why your structure is 9 byte. – Shiplu Mokaddim Oct 12 '12 at 17:22
  • 1
    Read section 6 of the [comp.lang.c FAQ](http://www.c-faq.com/); it's an *excellent* explanation of the (often confusing) relationship between arrays and pointers in C. – Keith Thompson Oct 12 '12 at 18:37

2 Answers2

2

Here is an answer that describes the char[1] trick. Basically, the idea is to allocate more memory when malloc()ing the struct, to already have some storage for your string without additional allocation. You can sometimes even see char something[0] used for the same purpose, which makes even less intuitive sense.

On the other hand, treating this as a full blown char * doesn't work either, as can be seen with the test_s.b is output above, which should show a NULL pointer then...

If something is an array, both its name and &name just give the pointer to the start of the array in C. This works regardless of whether it's a member in a struct, or a free standing variable.

printf("real sizeof(test_s.b): %lx \n", ((void *)(&test_s.b) - (void *)(&test_s.a)) );

This line gives the size of space allocated for a, not b in this struct. Put something after b and used this to subtract. With the packed attribute (which means you disallow the compiler to mess with alignment, etc.), you should get 1.

#include <stdio.h>
#include <string.h>

struct test_struct {
    char *a;
    char b[1];
    char c;
} __attribute__((packed));

int main() {
  struct test_struct s;
  printf("%lx\n", ((void*)&s.c) - ((void*)&s.b));
  return 0;
}

I get 1.

Community
  • 1
  • 1
Jakub Wasilewski
  • 2,916
  • 22
  • 27
  • Many thanks for your help and the [answer](http://stackoverflow.com/questions/6390331/why-use-array-size-1-instead-of-pointer/6390357#6390357) link you've provided! – grasbueschel Oct 12 '12 at 21:34
2

Understandably confusing when C isn't your native-tongue. A couple of things to clear up first.

In C, all var[n] means is "take the address represented at var, add n*sizeof(var's type) bytes to that address, returning the resulting address. Also worth noting, the C language does not stop you from walking past an array's declared size.

You'll often find the format you're viewing in the tail of structures that are designed to overlay on larger, and more important, variable length allocations of memory. In such structures it is customary (usually mandatory) to have one of the previous structure members dictate the actual valid bytes of the tail buffer space.

Example:

typedef struct X
{
   unsigned int count;
   char data[1];
} X;

This is markedly different than declaring a pointer member, which is nothing more than a variable that holds an address.

typedef struct Y
{
    unsigned int count;
    char *dataptr;
} Y;

In Y, dataptr holds an address (and has one as well). In X, data is the address.

So why do this? Take a look at this. The following memory dump assumes little endian, 1-byte structure packing, and both integer and pointer native sizes of 4 bytes:

0x00000000  0x10 0x00 0x00 0x00 0x01 0x02 0x03 0x04 
0x00000008  0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C
0x00000010  0x0D 0x0E 0x0F 0x10;

Now overlay a struct X on this, and you have

count : 16
data[] : { 0x01, 0x02, 0x03, ... 0x010 };

Overlaying a struct Y on this will have markedly different results.

count : 16
dataptr : 0x01020304

Remember, in C, you can handily (and usually tragically) walk off the end of an array's declared size. This overlay technique is little more than an exploit of that capability. Given the above for a memory region occupied at the head by a struct X you can do the following:

struct X * pX = funcThatReturnsTheMemoryAddressAbove();
for (unsigned int i=0; i<pX->count; i++)
{
   do something with pX->data[i];
}

Obviously you need to be vigilant in how you allocate a manage memory to do things like this.

Not sure if that helps clear things up at all, but hopefully somewhat.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141