Why is char not a single character?

Question

In Microsoft's documentation for DEV_BROADCAST_DEVICEINTERFACE_A the type of dbcc_name is defined as follows:

char  dbcc_name[1];

But as shown in this StackOverflow question it turns out to be a string with multiple characters.

Isn't a char a ~~16 bit value~~ single character? How does this work?

_{(I had originally thought that char is 16 bits. Presumably because that's its size in c#. Actually it was probably because I was looking at DEV_BROADCAST_DEVICEINTERFACE_W where it is indeed 2 bytes.)}

`sizeof(char)` is always `1`, what that is in bits depends on `CHAR_BIT` (typically 8) — NathanOliver, Jun 03 '20 at 15:37
You must be confused with what a `char` is in some other computer languages. A `char` in C++ has a size of 1, as stated in the previous comment. — PaulMcKenzie, Jun 03 '20 at 15:37
`char dbcc_name[1];` is a stretchy buffer. The structure is over-allocated as `malloc(sizeof(DEV_BROADCAST_DEVICEINTERFACE_A) + buffer_size)`, and the dbcc_name acts as the start of the variable length buffer. It's sort of sketchy C code (but not uncommon), and very bad C++ code. Note what [dbcc_size](https://learn.microsoft.com/en-us/windows/win32/api/dbt/ns-dbt-dev_broadcast_deviceinterface_a) is! — Eljay, Jun 03 '20 at 15:37
Sort of a [flexible array member](https://en.wikipedia.org/wiki/Flexible_array_member), except non-standard (and that's not in C++ anyway I believe). And chars are more often 8bits, not 16. — Mat, Jun 03 '20 at 15:38
[This is the C# char](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/char). Is this what causes the confusion? — PaulMcKenzie, Jun 03 '20 at 15:40
@Eljay I called it C++ because that's what Microsoft calls it in the link in the question (As a c# dev, I don't know myself). Are you saying that Microsoft is using bad C++ code or that their documentation is bad? — ispiro, Jun 03 '20 at 15:47
@Mat I called it C++ because that's what Microsoft calls it in the link in the question. As a C# dev, I don't know myself. — ispiro, Jun 03 '20 at 15:48
@ispiro The problem is that all of the Windows API is `C`-based. Anything C++ is something added by MFC or some other third-party wrapper around the Windows API. That documentation is really meant for `C` programmers, and C++ programmers have to figure out if it can be used (probably over 99% of the API can be used straight out of the box), or workaround anything that may cause an issue with C++. — PaulMcKenzie, Jun 03 '20 at 15:49
The situation with C++ and flexible array member (what we'd call a stretchy buffer, back in the day) has been discussed in depth in the comments in this SO question: https://stackoverflow.com/questions/4412749/are-flexible-array-members-valid-in-c — Eljay, Jun 03 '20 at 16:29
I think your question has a very good background, but the title should be adjusted. In general, `char` is always a datatype in C. It specify neither an aggregate nor a scalar variable. — RobertS supports Monica Cellio, Jun 03 '20 at 16:38
Check out this question: https://stackoverflow.com/q/295027/14065 — Martin York, Jun 17 '20 at 18:13

Konrad Rudolph · Accepted Answer · 2020-06-03T15:47:03.417

7

Isn't a char a 16 bit value?

On what platform? On most platforms (including all that run Windows, AFAIK), char is 8 bits.

How does this work?

The type documentation explains this:

dbcc_size

The size of this structure, in bytes. This is the size of the members plus the actual length of the dbcc_name string (the null character is accounted for by the declaration of dbcc_name as a one-character array.)

In other words, the definition of _DEV_BROADCAST_DEVICEINTERFACE_A exploits the fact that arrays decay to pointers in C++, and thus dbcc_name, which has array type, can be used as a zero-terminated string in most contexts. The actual string is stored contiguously with the _DEV_BROADCAST_DEVICEINTERFACE_A object, at the address starting at the offset of dbcc_name.

It’s worth noting that the size of the array (1) is unrelated to the length of its contents; it is simply the smallest legal static array size in C++ (legacy code occasionally uses struct members of type char[0]. However, this is a compiler extension and not legal C++).

edited Jun 03 '20 at 15:47

answered Jun 03 '20 at 15:39

Konrad Rudolph

530,221
131
937
1,214

The explanation is parens is saying that the null terminator is not included in the part of the `dbcc_size` value that excludes `sizeof(_DEV_BROADCAST_DEVICEINTERFACE_A)`, because the null terminator's slot has been accounted for in the array dimension included in the result of `sizeof`. Yes it's true that they _had_ to make give it a non-zero dimension, but I don't think that makes the statement misleading. I would probably argue, though, that it's an irrelevant implementation detail that distracts from the useful information. – Asteroids With Wings Jun 03 '20 at 15:43
@AsteroidsWithWings Fair enough – Konrad Rudolph Jun 03 '20 at 15:45
Thanks. I'm not sure where I remembered the 16 bits from. Anyway, as a C# dev, I assumed that the `char` at the beginning of that line meant that the type _is_ char. According to your answer - `dbcc_name, which has array type` it seems that the brackets mean that it's really an array of chars. Did I understand you correctly? – ispiro Jun 03 '20 at 15:45
1

@ispiro Declaration syntax in C# is different from C and C++. In C and C++, you declare a static array as shown here, i.e. `Type name[Size]`. In hindsight that’s idiotic, which is why C# changed it to be sane (`Type[] name`). The original idea behind C’s declaration syntax was that declarations mirror usage: the type of `name[0]` is `Type` (i.e. the type of `dbcc_name[0]` is `char`). – Konrad Rudolph Jun 03 '20 at 15:48

score 3 · Answer 2 · answered Jun 03 '20 at 16:56

This is what is known as the "struct hack". It's a trick that allows you to store variably-sized data in a struct instance.

You make the last member an array of size 1, like so:

struct foo { int i; char c[1] };

Assuming a 4-byte int, this type is 5 bytes wide (although it will likely take up 8 bytes to satisfy any alignment requirements), and an instance of struct foo would look like this:

   +---+
i: |   |
   +---+
   |   | 
   +---+
   |   |
   +---+
   |   | 
   +---+
c: |   |
   +---+

However, if you allocate memory for it dynamically with malloc or calloc, you can allocate more memory than just what's needed for the struct type and that extra memory will be considered part of the array (since struct elements are guaranteed to be laid out in the order declared and array types don't enforce sizes).

struct foo *p = malloc( sizeof *p + strlen( "hello" )); 
p->i = 1;
strcpy( p->c, "hello" );

So we allocate enough memory for the struct type (5 bytes) plus enough memory to store "hello", which gives us (assuming little-endian)

   +---+ ----+
i: | 1 |     |
   +---+     |
   | 0 |     |
   +---+     |
   | 0 |     +---- size of struct foo
   +---+     |
   | 0 |     |
   +---+     |
c: |'h'|     |
   +---+ ----+
   |'e'|     |
   +---+     |
   |'l'|     |
   +---+     |
   |'l'|     +---- Extra memory for "hello"
   +---+     |
   |'o'|     |
   +---+     |
   | 0 |     |
   +---+ ----+

Why do we make c an array of size 1 instead of a pointer? If we made c a pointer, like

struct foo { int i; char *c };

then this trick doesn't work, because all c can store is an address, not data.

C allows "flexible array members", where a size isn't needed in the declaration:

struct foo { int i; char c[] };

However, C++ does not (yet) support this, so you have to specify a non-zero size.

The explanation why the trick doesn’t work with a pointer is not really correct: a `char[1]` can *also* not store a string (in fact, this is simply UB in C++, although I omitted mentioning this in my answer). The reason this works isn’t that `char[1]` can store more data than a pointer, but rather that if we used a pointer then C++ would interpret the bytes at `offsetof(foo, c)` as a pointer to some *other* memory location, not as the start of a string. Because it’s declared as `char[1]`, C++ performs array-to-pointer decay upon usage of `c`, and we end up with a pointer *to* `c[0]`. — Konrad Rudolph, Jun 03 '20 at 17:00

score 2 · Answer 3 · answered Jun 03 '20 at 15:46

it is a very old trick.

typedef struct 
{
    size_t size;
    char str[1];
} mystring_t;

mystring_t *allocate(size_t size)
{
    return malloc(sizeof(mystring_t) + size -1);
}

Then you can easily reallocate it as the flexible part is always at the end of the struct.

In newer C versions

typedef struct 
{
    size_t size;
    char str[];
} mystring_t;

or using gcc

typedef struct 
{
    size_t size;
    char str[0];
} mystring_t;

Why is char not a single character?

3 Answers3

`dbcc_size`

Linked