C++ and C-string variables, char array size

Question

I've searched through many articles on here and have self-tested this concept with my own code. My question is to satisfy my own curiosity and maybe help others as I cannot find a answer describing this concept in particular. My textbook (teaching C++) describes a C-string variable:

A C-string variable is an array of characters. The following array declaration provides a C-string variable "s" capable of storing a C-string value with nine or fewer characters:
char s[10];
The 10 is for the nine letters in the string plus the null character '\0' to mark the end of the string. Like any other partially filled array, a C-string variable uses positions starting at indexed variable 0 through as many as are needed.

I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? i.e. 0-10 = 11 spaces. If the \0 character occupies one space, then we'd still be able to store 10 characters and not 9 as per the book.

In my own testing, I declared a character array test[4] and stored the word "cat" in the array. When looking at individual positions within the array, I can see individual characters at each index i.e:

test[0] = c
test[1] = a
test[2] = t
test[3] =  
test[4] =

Why do we need 2 additional slots in the character array and not 1?

Please have a look at this [C++ books](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) list. Some books are better than others. — Ron, Oct 09 '17 at 12:35
`array_name[size]` means the valid indexes are `[0, size)`, not `[0, size]`. Sounds like you could use a [good C++ book](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) — NathanOliver, Oct 09 '17 at 12:36
The book is inaccurate. To be a C-string, the array content must satisfy the nul termination property. But `s` itself is just an array. It is not a C-string, but it may hold one. — StoryTeller - Unslander Monica, Oct 09 '17 at 12:37
When you define the array you specify the number of elements. The indexes then go from `0` to number of elements minus one (`9` in this specific case). — Some programmer dude, Oct 09 '17 at 12:37
Talking about C string variables may be misleading. C only has string literals, char arrays, and pointers to char, as well as string functions. But it doesn't really have string variables at language level. It's purely up to programmer to handle some char pointer or array as "string variable". — hyde, Oct 09 '17 at 12:38
@StoryTeller: The book is fairly accurate but misleading. Parse it as "`char s[10];` is C (string variable)". It's uninitialized, so the variable doesn't actually hold a C string value, but it could. Hyde has a good point in the previous comment, though: In the core language, C strings are an abstraction, a matter of convention. They only become a reality in the Standard Library. — MSalters, Oct 09 '17 at 12:46
@MSalters - Aren't all data formats just that? I'm not saying C-strings are something special, only that it's a property of the data, not the variable. — StoryTeller - Unslander Monica, Oct 09 '17 at 12:47
@StoryTeller: Well, the core language _does_ define the properties of integers, especially unsigned integers. There's no need to have an `intcpy()` or ` intcmp()` in the Standard Library. But strictly speaking, I'd say these are properties of _types_. An _object_ combines _memory_ and type, and a _variable_ gives an object a _scope_ and a _name_. — MSalters, Oct 09 '17 at 13:01
"provides a C-string variable "s" capable of storing a C-string value with nine or fewer characters:" is off by 1.A string in C includes the _null character_ so `s` is capable of string a _string_ up to 10 characters. The last being the null character. — chux - Reinstate Monica, Oct 09 '17 at 13:07

score 5 · Answer 1 · answered Oct 09 '17 at 12:37

5

An array with size N has indexes starting at 0 and ending with N-1. It does not have an element with index N.

With your example of char test[4], the array has indexes 0, 1, 2, and 3. Attempting to access index 4 is going off the end of the array. C and C++ do no prevent you from doing so, and attempting to do so invokes undefined behavior.

answered Oct 09 '17 at 12:37

dbush

205,898
23
218
273

This cleared-up my confusion. The character array topic is a the first time array's are introduced in the book and the concepts weren't elaborated on. Knowing that the size is N-1 makes much more sense now. Thank you dbush – Christiaan Oct 09 '17 at 12:50
@Christiaan If this helped then please consider accepting it as answer. – Ron Oct 09 '17 at 12:52
5

@Christiaan The size is *not* N-1. The size is N, but the index of the Nth element is actually N-1. – dbush Oct 09 '17 at 12:54

Raindrop7 · Answer 2 · 2017-10-09T13:06:13.627

1

You can look at an array as a group of variables of the same type and size which are consecutive in memory one next to the other.

Arrays are indexed from 0 as the first element to the n - 1 as the last element. So you can access any element just using an index.

Trying to access an array with an index i >= n or a negative index i < 0 Will issue in undefined behavior.

Arrays of characters need to set the last element as a NULL character \0.

Here is an example:

char c[5] = "Hello"; // Error

Above c has 5 elements and \0 so it is 6 Byte long. So to correct it:

char c[6] = "Hello"; //  Null character added automatically
// char c[] = "Hello";

Look at this example:

char text[6] = {'H', 'e', 'l', 'l', 'o', '\0'};

Above in a such initialization you must add the null terminator character '\0' otherwise you'll get a garbage characters at the end of your string.

std::cout << text[0]; // H which is the first element
std::cout << text[6 - 1 - 1]; // o which is the last character in the array.

arrays of other types other than characters need not to add a null terminator and the number of elements is n but indexing is the same 0 through n - 1;
```
int array[5] = {4, 5, 9, 22, 16};
std::cout << array[0];  // 4
std::cout << array[5 - 1]; // 16 
```

edited Oct 09 '17 at 13:06

answered Oct 09 '17 at 12:52

Raindrop7

3,889
3
16
27

1

"Arrays of characters need to set the last element as a NULL character \0" is only necessary if the array is to be treated as a _string_. `char c[5] = "Hello";` is not an error - yet is this case, `c[]` is not a string. – chux - Reinstate Monica Oct 09 '17 at 13:09
@chux: Isn't it an overflow? – Raindrop7 Oct 09 '17 at 13:10
1

`char text[6] = {'H', 'e', 'l', 'l', 'o'};` will _not_ get a garbage character at the end. An incomplete initialization will the fill remaining, typically with zeros. – chux - Reinstate Monica Oct 09 '17 at 13:12
`"char c[5] = "Hello";` is not an overflow. Only the first 5 characters of ` "Hello"` are used. – chux - Reinstate Monica Oct 09 '17 at 13:13
@chux: "char c[5] = "Hello" doesn't compile even! Error" a value of type "const char [6]" cannot be used to initialize an entity of type "char [5]". – Raindrop7 Oct 09 '17 at 13:14
1

Perhaps we are addressing the differences between C/C++ - Sadly the post is tagged both ways. `char c[5] = "Hello";` does compile - in C. – chux - Reinstate Monica Oct 09 '17 at 13:15
1

@chux: I don't know it it compiles in `C` I only tried it in `C++`. Thanx. – Raindrop7 Oct 09 '17 at 13:18

Roddo · Answer 3 · 2017-10-11T10:18:12.507

I think your problem comes from the misconception of how you access the memory.

s[n] refers at accessing the value pointed by the pointer s plus n blocks in memory, it can also be written *(s + n)

So basically, declaring

char s[4];

and setting cat in it, you will get this layout in memory

     +---+
  s: | c | s[0] also (s + 0)
     +---+ 
     | a | s[1] also (s + 1)
     +---+
     | t | s[2] also (s + 2)
     +---+
     |\0 | s[3] also (s + 3)
     +---+
     | ? | s[4] also (s + 4)
     +---+
     | ? | s[5] also (s + 5)
     +---+
     | ? | s[6] also (s + 6)
     +---+
     | ? | s[7] also (s + 7)
     +---+
     | ? | s[8] also (s + 8)
     +---+
     | ? | s[9] also (s + 9)
     +---+

The ? stand for a variable which we aren't sure of the value.

You CAN access it, you can even modify it sometimes. But the behavior of this isn't clear and can be undefined.

For your example s[4] can change each time the executable is executed.

score 0 · Answer 4 · answered Oct 09 '17 at 12:37

0

char s[10]; had valid indices from 0 to 9 and not 0 to 10 as you said.

answered Oct 09 '17 at 12:37

Superman

159
1
6

score 0 · Answer 5 · answered Oct 09 '17 at 15:17

A C-string variable is an array of characters.

This is slightly misleading. A C string is a sequence of character values followed by a 0-valued terminator. For example, the string "Hello" is represented as the sequence {'H','e', 'l', 'l', 'o', 0}. The presence of the 0 terminator makes the character sequence a string. All C string handling functions (strcat, strcmp, strcpy, strchr, etc.) assume the presence of that terminator; if the terminator isn't there, those routines will not function properly.

Strings are stored in arrays of character type (char for ASCII, EBCDIC, or UTF-8 strings or wchar_t for "wide" strings¹). Multiple strings may be stored in a single array if there's sufficient space. A 10-element array may store a single 9-character string, or two 4 character strings, or 5 one-character strings. Remember that to store an N-character string you need N+1 array elements to account for the terminator.

I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? i.e. 0-10 = 11 spaces.

Total storage size is 10, but array elements are indexed from 0 to 9. Given the declaration

char foo[10];

you get the following layout in memory:

     +---+
foo: |   | foo[0]
     +---+ 
     |   | foo[1]
     +---+
     |   | foo[2]
     +---+
     |   | foo[3]
     +---+
     |   | foo[4]
     +---+
     |   | foo[5]
     +---+
     |   | foo[6]
     +---+
     |   | foo[7]
     +---+
     |   | foo[8]
     +---+
     |   | foo[9]
     +---+

For any N-element array, individual elements are indexed from 0 through N-1.

Remember that in C, the array subscript operation a[i] is defined as *(a + i) - given a starting address a², offset i elements (not bytes!!!) from that address and dereference the result. The first element is stored at a, the second element is stored at a + 1, the third at a + 2, etc.

^{wchar_t was introduced to represent character sets outside the ranges defined by ASCII or EBCDIC, so it's wider than the `char` type (often the width of two `char`s). With the advent of schemes like UTF-8 to represent non-English character sets, it's not that useful and I don't see it used very often.
At some point, you're going to hear someone say "an array is just a pointer". This is not correct. Under most circumstances, an expression of array type will be converted ("decay") to an expression of pointer type, and the value of the expression will be the address of the first element in the array. The array object itself is not a pointer, nor does it set aside any space for a pointer value.}

score 0 · Answer 6 · answered Oct 09 '17 at 15:20

char test[4] is the definition of your array, it means your array's size is four, but you can only use the spaces: test[0], test[1], test[2], test[3].

You can go to http://www.cplusplus.com/doc/tutorial/arrays/ to learn more about Arrays of c++.

C++ and C-string variables, char array size

6 Answers6