58

In C++ what is the difference (if any) between using char and char[1].

examples:

struct SomeStruct
{
   char x;
   char y[1];
};

Do the same reasons follow for unsigned char?

  • 19
    The amount of wrong answers to this question is astounding. +1 for asking something apparently people misunderstand often. – Billy ONeal Nov 08 '10 at 01:30
  • 2
    I read this question and could almost here the hundreds of keyboards furiously typing to be the first to answer. – Anthony Nov 08 '10 at 01:36
  • 2
    @Billy, @Duracell: Unfortunately, most C and C++ programmers do not really understand arrays. On the one hand, it's somewhat surprising, considering how fundamental they are; on the other hand, array decay and the way subscripting works makes it easy for beginners to think that arrays are the same thing as pointers. – James McNellis Nov 08 '10 at 05:53

3 Answers3

39

The main difference is just the syntax you use to access your one char.

By "access" I mean, act upon it using the various operators in the language, most or all of which do different things when applied to a char compared with a char array. This makes it sound as if x and y are almost entirely different. If fact they both "consist of" one char, but that char has been represented in a very different way.

The implementation could cause there to be other differences, for example it could align and pad the structure differently according to which one you use. But I doubt it will.

An example of the operator differences is that a char is assignable, and an array isn't:

SomeStruct a;
a.x = 'a';
a.y[0] = 'a';
SomeStruct b;
b.x = a.x; // OK
b.y = a.y; // not OK
b.y[0] = a.y[0]; // OK

But the fact that y isn't assignable doesn't stop SomeStruct being assignable:

b = a; // OK

All this is regardless of the type, char or not. An object of a type, and an array of that type with size 1, are pretty much the same in terms of what's in memory.

As an aside, there is a context in which it makes a big difference which you "use" out of char and char[1], and which sometimes helps confuse people into thinking that arrays are really pointers. Not your example, but as a function parameter:

void foo(char c);     // a function which takes a char as a parameter
void bar(char c[1]);  // a function which takes a char* as a parameter
void baz(char c[12]); // also a function which takes a char* as a parameter

The numbers provided in the declarations of bar and baz are completely ignored by the C++ language. Apparently someone at some point felt that it would be useful to programmers as a form of documentation, indicating that the function baz is expecting its pointer argument to point to the first element of an array of 12 char.

In bar and baz, c never has array type - it looks like an array type, but it isn't, it's just a fancy special-case syntax with the same meaning as char *c. Which is why I put the quotation marks on "use" - you aren't really using char[1] at all, it just looks like it.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 9
    +1, I'm tired of all the answers saying that `y` is a pointer. – casablanca Nov 08 '10 at 01:30
  • 4
    Nice and clear answer. Another thing that will be different is initialization with constructors (if he were to add one), which sadly doesn't work in current C++ with arrays (other than to value-initialize them). – Johannes Schaub - litb Nov 08 '10 at 01:56
  • 3
    The main difference is really just the *type* assigned by the type system. `x` is of type `char`, and `y` is of type `char[1]`. I know that sounds like I'm just stating the obvious, but that's really the main difference. You can't pass `y` to a function expecting a `char` as a parameter, for instance. But ultimately x and y will both return the same value when passed to the `sizeof` operator, and are both likely to be represented exactly the same way in memory. Also, `y` can degrade to a pointer, meaning you can pass it to something like `strcpy`, but you can't do that with `x` – Charles Salvia Nov 08 '10 at 01:58
  • 1
    I believe that ultimately @Charles has a very good point. The type is different and ultimately many operations will behave different. For instance, both `!a.x` and `!a.y` will work, but mean something totally different. And `cout << a.x` does probably what was intended but `cout << a.y` will just do nonsense. – Johannes Schaub - litb Nov 08 '10 at 02:03
  • Yes, my intent was to sweep all this under "the syntax you use to access your one char". But that doesn't quite cover it, so thanks for the extra examples. – Steve Jessop Nov 08 '10 at 02:05
  • Also, it's worth pointing out we can make `y` a null-terminated C-string, (of course, it can only be a zero-length C-string.) – Charles Salvia Nov 08 '10 at 02:08
  • @casablanca - most people use _pointer_ to mean a memory location; some apparently mean only _variables_ that _contain_ an address. **a.y** is the _memory address_ of a 1-char array, and probably one byte higher than the address of **a** (**a.x**). – NVRAM Nov 08 '10 at 02:10
  • 1
    Good answer. My snarky one-line answer to the OP was going to be that "x is a char, and y is a character array (whose size just happens to be 1); y is not a char, and it is not char *". I'll use this opportunity to plug one of my favorite books on C, [Peter van der Linden's "Expert C Programming"](http://www.amazon.com/Expert-Programming-Peter-van-Linden/dp/0131774298), which discusses all the subtleties of pointers, arrays, array decaying, etc... – Dan Nov 08 '10 at 02:11
  • @Charles: I'd say that we can make `x` a NUL-terminated C string (of zero length). Certainly, if its value is 0 then a pointer to it is a valid parameter to the standard `str` functions (maybe not as `dst`) :-) – Steve Jessop Nov 08 '10 at 02:17
  • 1
    @Carter what you say is true in C, where the *full expression* `a.y` is such an address (because of the automatic, unconditional, decay). But it is not true in C++, where the *full expression* `a.y` is not an address, but actually an array (because the decay is not unconditional). It's wrong in both languages to claim that the variable `a.y` would be a pointer or would store an address. But it only makes sense to talk about pointers when you say that you talk about a suitable expression, instead of the variable `a.y` as such. – Johannes Schaub - litb Nov 08 '10 at 02:19
  • @Carter: Take a look at sizeof(a.y). (Hint: it's different from sizeof(char*).) –  Nov 08 '10 at 02:27
  • 1
    @Carter further, I've no problem saying that `this` is a pointer, even though it is not an object that contains an address (i heard someone in usenet say that it is not a pointer, because he said only objects can be pointers, and `this` would only be a pointer *value*. I think at that level, that's really hair splitting and of no sense). So this has nothing to do with "most people use pointer to mean a memory location; some apparently mean only variables that contain an address", at least not concerning me. – Johannes Schaub - litb Nov 08 '10 at 02:29
  • @litb: 5.1/3 says that `this` is the name of a pointer, so certainly there are "pointers" (not merely "pointer values", if that's a different thing according to usenet guy) which are not objects. So "really hair splitting" and also "wrong", I think :-) Where it's necessary to distinguish I'd probably do the reverse - call "pointers" the things he calls "pointer values", and call "pointer objects" the things he calls "pointers". Likewise `5` is an int, `int x;` means `x` is an int, or an "int object" or "int lvalue" if necessary to avoid ambiguity. – Steve Jessop Nov 08 '10 at 02:46
  • 1
    @Johannes - indeed **a.y** does not store an address, and can't be altered directly (it's not an lval) if you reread my comment, I don't think that's implied. I think Dan's idea of suggesting further reading for the OP or subsequent viewers is good, since there's an enormous difference between arrays in C/C++ and pointerless languages (Java, C#, Javascript, etc.). I think we could go overboard being precise with our words - it's difficult to be not be too verbose and never be misleading, depending on the reader. – NVRAM Nov 08 '10 at 02:58
  • `y` is a `const` pointer, it is still a pointer. If you disagree, think `&a` -- it can't be altered, but it is still a pointer. – J-16 SDiZ Nov 08 '10 at 03:17
  • 1
    @J-16: It is not a const pointer. This is precisely what so many people have pointed out here today. –  Nov 08 '10 at 06:13
  • @Roger Pate, and I disagree with them. Claiming `a.y` is not a `const pointer` is like claiming `1` is not an `int`. lvalue or not does not affect its type. – J-16 SDiZ Nov 08 '10 at 06:47
  • 2
    @J-16: What is the size of a char pointer (on your machine)? What is the size of a.y? `char* p; char a[1]; assert(sizeof(p) == sizeof(a));` will *fail;* this is all about type, not about lvalue or rvalue. –  Nov 08 '10 at 06:54
  • @J-16: http://www.it.usyd.edu.au/~dasymond/mirror/c-faq/aryptr/index.html may help, although it refers to C rather than C++. There are some very fine details which are different between the two, but the basic points are all the same. Then again, if you're going to disagree with everyone here (except Carter), maybe you disagree with the C-FAQ too. If you've read the C++ standard, and believe that it states arrays are pointers, perhaps let us know which passages it is you think say that. It may be a simple misunderstanding. – Steve Jessop Nov 08 '10 at 11:14
  • 1
    okay, i was wrong. it is the fact that both type implicitly cast to each other that confused me. – J-16 SDiZ Nov 08 '10 at 11:27
11

If you've actually seen the construct char y[1] as the last member of a struct in production code, then it is fairly likely that you've encountered an instance of the struct hack.

That short array is a stand-in for a real, but variable length array (recall that before c99 there was no such thing in the c standard). The programmer would always allocate such structs on the heap, taking care to insure that the allocation was big enough for the actual size of array that he wanted to use.

Community
  • 1
  • 1
dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
  • Steve: Unless that C++ code needs to talk with C code. For example C++ code consuming Win32APIs – Billy ONeal Nov 08 '10 at 01:33
  • @Billy: sure, if the hack appears in a struct defined by some C API, that's OK. I guess the hack is still "in C++ code", in the sense that the header becomes C++ as soon as you `#include` it in a C++ program, so all right, it's an exception to my rule. – Steve Jessop Nov 08 '10 at 01:39
  • 4
    @Billy: I think @Steve's advice still applies in that case, but for a different reason. Whenever you have to consume Win32 APIs, worry... ;) – jalf Nov 08 '10 at 02:25
  • @jalf: Perhaps I should have just said C APIs. POSIX has a few such APIs that do this I'm sure. – Billy ONeal Nov 08 '10 at 03:06
  • @Billy: `struct dirent`, for example. – Steve Jessop Nov 08 '10 at 11:11
3

As well as the notational differences in usage emphasised by Steve, char[1] can be passed to e.g. template <int N> void f(char(&a)[N]), where char x = '\0'; f(&x); wouldn't match. Reliably capturing the size of array arguments is very convenient and reassuring.

It may also imply something different: either that the real length may be longer (as explained by dmckee), or that the content is logically an ASCIIZ string (that happens to be empty in this case), or an array of characters (that happens to have one element). If the structure was one of several related structures (e.g. a mathematical vector where the array size was a template argument, or an encoding of the layout of memory needed for some I/O operation), then it's entirely possible that some similarity with other fields where the arrays may be larger would suggest a preference for a single-character array, allowing support code to be simpler and/or more universally applicable.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252