Why are strings in C++ usually terminated with '\0'?

Question

In many code samples, people usually use '\0' after creating a new char array like this:

string s = "JustAString";
char* array = new char[s.size() + 1];
strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';

Why should we use '\0' here?

C string, which is a essentially a char array, must be NUL terminated. Otherwise, the functions in `string.h` will not function as expected. — nhahtdh, Jun 08 '12 at 04:24
In C, you will see this a lot. In C++, there are probably better ways to get the same thing accomplished. — jedwards, Jun 08 '12 at 04:24
it is not for the compiler, it is for the libraries and possibly your code. C does not support arrays properly. You can have local arrays, but there is no way to pass them about. If you try you just pass the start address (address of first element). So you can ever have the last element be special e.g. '\0' or always pass the size, being careful not to mess up. I use a set of macros to pass a start-address, length bi-tuple. Structures are another way. Classes are the best way. But C did not have classes. — ctrl-alt-delor, Jun 08 '12 at 09:55
Related: http://stackoverflow.com/questions/4418708/whats-the-rationale-for-null-terminated-strings — Flexo, Jun 09 '12 at 18:29
Are std::string objects replacing C-style strings in C++? What is wrong with C strings, other than the memory management issues? — octopusgrabbus, Jun 11 '12 at 16:30

pb2q · Accepted Answer · 2022-04-12T13:52:39.570

47

The title of your question references C strings. C++ std::string objects are handled differently than standard C strings. \0 is important when using C strings, and when I use the term string in this answer, I'm referring to standard C strings.

\0 acts as a string terminator in C. It is known as the null character, or NUL, and standard C strings are null-terminated. This terminator signals code that processes strings - standard libraries but also your own code - where the end of a string is. A good example is strlen which returns the length of a string: strlen works using the assumption that it operates on strings that are terminated using \0.

When you declare a constant string with:

const char *str = "JustAString";

then the \0 is appended automatically for you. In other cases, where you'll be managing a non-constant string as with your array example, you'll sometimes need to deal with it yourself. The docs for strncpy, which is used in your example, are a good illustration: strncpy copies over the null terminator character except in the case where the specified length is reached before the entire string is copied. Hence you'll often see strncpy combined with the possibly redundant assignment of a null terminator. strlcpy and strcpy_s were designed to address the potential problems that arise from neglecting to handle this case.

In your particular example, array[s.size()] = '\0'; is one such redundancy: since array is of size s.size() + 1, and strncpy is copying s.size() characters, the function will append the \0.

The documentation for standard C string utilities will indicate when you'll need to be careful to include such a null terminator. But read the documentation carefully: as with strncpy the details are easily overlooked, leading to potential buffer overflows.

edited Apr 12 '22 at 13:52

answered Jun 08 '12 at 04:23

pb2q

58,613
19
146
147

So, how are strings in C++ terminated? I figured out that they are not NULL terminated, because on adding '\0' at any arbitrary index does not trim the string like in C, but it only replaces that index in string by an empty character. – CaptainDaVinci Mar 07 '17 at 16:04
1

@CaptainDaVinci They're not necessarily terminated since the length is stored internally. If you call `c_str()` then you'll get a properly terminated buffer, but only because you asked nicely. – tadman Dec 11 '17 at 18:35
@tadman is there an efficient way for `std::string` to implement `c_str()` other than by always keeping a NUL-terminator byte at the end of the string-object's internal byte-array? – Jeremy Friesner Apr 12 '22 at 13:52
@JeremyFriesner How else would you accomplish such a thing? In practice `std::string` might allocate slightly over the requested amount, it's implementation defined, so there could be zero padding there already. – tadman Apr 12 '22 at 14:36
@tadman agreed -- the only other way I can imagine would be to have `c_str()` dynamically allocate a separate buffer when it is called, with room for a NUL terminator, and return that... but of course that would be extremely inefficient for long strings, and likely lead to memory leaks. So my presumption is that every production-ready `std::string` implementation does in fact just store a NUL-terminated string internally, so that `c_str()` can simply return a pointer to it. – Jeremy Friesner Apr 12 '22 at 14:56
@JeremyFriesner Implementing `c_str()` with an allocation is not really ideal, so it's probably not done. If you're in a position where you can't assume, do tests, and explore how it behaves under varying string lengths from 1 to 1GB as some sizes might be "cursed". – tadman Apr 13 '22 at 15:50

Alok Save · Answer 2 · 2012-06-08T09:26:31.543

16

Why are strings in C++ usually terminated with '\0'?

Note that C++ Strings and C strings are not the same.
In C++ string refers to std::string which is a template class and provides a lot of intuitive functions to handle the string.
Note that C++ std::string are not \0 terminated, but the class provides functions to fetch the underlying string data as \0 terminated c-style string.

In C a string is collection of characters. This collection usually ends with a \0.
Unless a special character like \0 is used there would be no way of knowing when a string ends.
It is also aptly known as the string null terminator.

Ofcourse, there could be other ways of bookkeeping to track the length of the string, but using a special character has two straight advantages:

It is more intuitive and
There are no additional overheads

Note that \0 is needed because most of Standard C library functions operate on strings assuming they are \0 terminated.
For example:
While using printf() if you have an string which is not \0terminated then printf() keeps writing characters to stdout until a \0 is encountered, in short it might even print garbage.

Why should we use '\0' here?

There are two scenarios when you do not need to \0 terminate a string:

In any usage if you are explicitly bookkeeping length of the string and
If you are using some standard library api will implicitly add a \0 to strings.

In your case you already have the second scenario working for you.

array[s.size()] = '\0';

The above code statement is redundant in your example.

For your example using strncpy() makes it useless. strncpy() copies s.size() characters to your array, Note that it appends a null termination if there is any space left after copying the strings. Since arrayis of size s.size() + 1 a \0 is automagically added.

edited Jun 08 '12 at 09:26

answered Jun 08 '12 at 04:23

Alok Save

202,538
53
430
533

1

Not necessarily. You can also store an arbitary-length array by keeping the length somewhere (like how Java works.. I assume). – Brendan Long Jun 08 '12 at 04:25
@BrendanLong: Hope that answers. – Alok Save Jun 08 '12 at 04:28
@BrendanLong I'm assuming edit went through after that comment but as pointed out it removes additional overhead. To do it the way you're suggesting you would need to make a struct with an int as well as the array and that would offer worse performance and consume more memory. – evanmcdonnal Jun 08 '12 at 04:30
2

@evanmcdonnal More overhead, yes, but the idea that the null pointer has "no overhead" is untrue -- it's one extra character (1-4 bytes). If you're using UTF32 (for some reason), then they'd be exactly the same size. Storing the length is also much faster in any case where you need to look up the length (since with a null terminator, you need to walk all the way through the string to figure out its length). I'm just trying to point out that it's not "one way is obviously better". It's notable that C++ stores the length for strings and vectors. – Brendan Long Jun 08 '12 at 04:33
1

I also disagree with the "more intuitive" point, since storing the length of your data seems just as intuitive to me as using a sentinel value. – Brendan Long Jun 08 '12 at 04:34
@BrendanLong This is C so it's most likely ASCII so it's 1 byte, saving you three. Plus it saves you all the incrementing you would have to do whenever your string grew. The length point is a good one and may very well even it out (depends how ofter you care about length). Also, I agree with your point about it being intuitive. I prefer for loops to while loops and that would allow for all for loops. – evanmcdonnal Jun 08 '12 at 04:40

score 6 · Answer 3 · answered Jun 08 '12 at 04:27

6

'\0' is the null termination character. If your character array didn't have it and you tried to do a strcpy you would have a buffer overflow. Many functions rely on it to know when they need to stop reading or writing memory.

answered Jun 08 '12 at 04:27

evanmcdonnal

46,131
16
104
115

score 4 · Answer 4 · answered Jun 08 '12 at 06:28

4

strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';

Why should we use '\0' here?

You shouldn't, that second line is waste of space. strncpy already adds a null termination if you know how to use it. The code can be rewritten as:

strncpy(array, s.c_str(), s.size()+1);

strncpy is sort of a weird function, it assumes that the first parameter is an array of the size of the third parameter. So it only copies null termination if there is any space left after copying the strings.

You could also have used memcpy() in this case, it will be slightly more efficient, though perhaps makes the code less intuitive to read.

answered Jun 08 '12 at 06:28

Lundin

195,001
40
254
396

or the other way around, strncpy being so weird perhaps makes the code less intuitive than the straightforward memcpy. But when I see code as shown above, my first reflex is usually to check if copying data to an array could not be completely avoided by direct use of c_str() content, because the final zero is often added to strings that wont be modified afterward (output strings). – kriss Aug 20 '12 at 20:36
use `strcpy(array, &s[0]);` if you want to copy to the first \0. (which is std::strlen(&s[0])+1 many chars) use `strncpy(array, &s[0], s.size()+1);` if you want to copy to the first \0 and fill the rest with \0... use `memcpy(array, &s[0], s.size()+1);` if you want to copy the given size from &s[0]. (so embed \0 wont clear the rest of the string) – Puddle Jan 25 '19 at 14:35

score 2 · Answer 5 · answered Jun 08 '12 at 04:40

In C, we represent string with an array of char (or w_char), and use special character to signal the end of the string. As opposed to Pascal, which stores the length of the string in the index 0 of the array (thus the string has a hard limit on the number of characters), there is theoretically no limit on the number of characters that a string (represented as array of characters) can have in C.

The special character is expected to be NUL in all the functions from the default library in C, and also other libraries. If you want to use the library functions that relies on the exact length of the string, you must terminate the string with NUL. You can totally define your own terminating character, but you must understand that library functions involving string (as array of characters) may not work as you expect and it will cause all sorts of errors.

In the snippet of code given, there is a need to explicitly set the terminating character to NUL, since you don't know if there are trash data in the array allocated. It is also a good practice, since in large code, you may not see the initialization of the array of characters.

Why are strings in C++ usually terminated with '\0'?

5 Answers5

Linked

Related