Why don't char arrays with separate chars end with a null-terminator unlike string literals?

Question

I was playing around with char arrays in c++ and wrote this program:

int main()
{

char text[] = { 'h', 'e', 'l', 'l', 'o' };  //arrays initialised like this 
                                            //will have a size of the number 
                                            //of elements that you see

char text2[] = "hello"; //arrays initialised like this will have a size of 
                        //the number of elements that you see + 1 (0 on the 
                        //end to show where the end is

cout << endl;

cout << "The size of the first array is: " << sizeof(text) << endl;

cout << endl;

for (int i = 0; i < sizeof(text); i++)
{
    cout << i << ":" << text[i] << endl;
}
cout << endl;

cout << "The size of the first array is: " << sizeof(text2) << endl;

cout << endl;

for (int i = 0; i < sizeof(text2); i++)
{
    cout << i << ":" << text2[i] << endl;
}
cout << endl;

cin.get();

return 0;
}

This program gives me the output:

The size of the first array is: 5

0:h
1:e
2:l
3:l
4:o

The size of the first array is: 6

0:h
1:e
2:l
3:l
4:o
5:

My question is: Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0) on the end unlike initializing a char array with a string literal?

it would be rather annoying if each `char` array had a null implicitly added, while for string literals thats just what you want — 463035818_is_not_an_ai, Apr 06 '18 at 15:12
It is just the way the language works. When you take control and specify what you want (`{ 'h', 'e', 'l', 'l', 'o' }`), that is what you get. — NathanOliver, Apr 06 '18 at 15:12
Nice observation! I guess the answer is, "What if I actually want an array of `char`s that isn't a string? How could I get that otherwise?" — BoBTFish, Apr 06 '18 at 15:13
Because sometimes you want an array of bytes instead of "characters"? It really depends on the use-case, so the compiler cant make any assumptions. — Some programmer dude, Apr 06 '18 at 15:13
maybe what causes your confusing is that not each `char` array is used to store character sequences. `char` is basically just a type like `int` or `float` that can hold some values. Being used as a string is just one usecase, though a very common one — 463035818_is_not_an_ai, Apr 06 '18 at 15:14
Odd duplicate that Community spotted, no? That did not mention the explicit char array. — Bathsheba, Apr 06 '18 at 15:22
@Bathsheba The answer did, albeit maybe not as directly as you like: https://stackoverflow.com/a/40821770/2757035 — underscore_d, Apr 06 '18 at 15:33
@underscore_d: Odd policy that. I could create a question "what is the C++ standard", answer it with a verbatim copy of the C++ standard, and close *every* C++ question to that answer. For me a duplicate has to be "the question is an exact duplicate of this question". Disk is cheap. — Bathsheba, Apr 06 '18 at 15:35

score 4 · Accepted Answer · edited Apr 06 '18 at 15:16

4

A curly braces initializer just provides the specified values for an array (or if the array is larger, the rest of the items are defaulted). It's not a string even if the items are char values. char is just the smallest integer type.

A string literal denotes a zero-terminated sequence of values.

That's all.

edited Apr 06 '18 at 15:16

Bathsheba

231,907
34
361
483

answered Apr 06 '18 at 15:14

Cheers and hth. - Alf

142,714
15
209
331

s/signed integer type/integer type since we don't know what the signedness of `char` is. – NathanOliver Apr 06 '18 at 15:15
Just in case Cheers and hth. - Alf is now in the pub, I've made a cheeky edit. – Bathsheba Apr 06 '18 at 15:17
Okay that makes sense, thanks everyone for the comments/answers. – Chris Gray Apr 06 '18 at 15:31

score 1 · Answer 2 · answered Apr 06 '18 at 15:15

Informally, it's the second quotation character in a string literal of the form "foo" that adds the NUL-terminator.

In C++, "foo" is a const char[4] type, which decays to a const char* in certain situations.

It's just how the language works, that's all. And it's very useful since it dovetales nicely with all the standard library functions that model a string as a pointer to the first element in a NUL-terminated array of chars.

Splicing in an extra element with something like char text[] = { 'h', 'e', 'l', 'l', 'o' }; would be really annoying and it could introduce inconsistency into the language. Would you do the same thing for signed char, and unsigned char, for example? And what about int8_t?

Maxim Egorushkin · Answer 3 · 2018-04-06T15:20:49.447

1

You can terminate it yourself in multiple ways:

char text1[6] = { 'h', 'e', 'l', 'l', 'o' };
char text2[sizeof "hello"] = { 'h', 'e', 'l', 'l', 'o' };
char text3[] = "hello"; // <--- my personal favourite

edited Apr 06 '18 at 15:20

answered Apr 06 '18 at 15:15

Maxim Egorushkin

131,725
17
180
271

score 1 · Answer 4 · answered Apr 06 '18 at 15:17

A string literal like for example this "hello" has a type of a constant character array and initializwd the following way

const char string_literal_hello[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

As it is seen the type of the string literal is const char[6]. It contains six characters.

Thus this declaration

char text2[] = "hello";

that can be also written like

char text2[] = { "hello" };

in fact is substituted for the following declaration

char text2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

That is then a string literal is used as an initializer of a character array all its characters are used to initialize the array.

score 1 · Answer 5 · answered Apr 06 '18 at 15:22

Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0)

The reason is because that syntax...

Type name[] = { comma separated list };

...is used for initializing arrays of any type. Not just char.

The "quoted string" syntax is shorthand for a very specific type of array that assumes a null terminator is desired.

score 0 · Answer 6 · answered Apr 06 '18 at 15:17

When you designate a double quote delimited set of adjacent characters (a string literal), it is assumed that what you want is a string. And a string in C means an array of characters that is null-terminated, because that's what the functions that operate on strings (printf, strcpy, etc...) expect. So the compiler automatically adds that null terminator for you.

When you provide a brace delimited, comma separated list of single quote delimited characters, it is assumed that you don't want a string, but you want an array of the exact characters you specified. So no null terminator is added.

C++ inherits this behavior.

Note though that in C, `"foo"` is a `char[4]` type although it's UB to try to modify it. Also note that `'h'` is an `int` type in C. In other words, the languages diverge so much in this area, I avoided making the comparison. — Bathsheba, Apr 06 '18 at 15:21

Why don't char arrays with separate chars end with a null-terminator unlike string literals?

6 Answers6