What is the internal structure of std::wstring? Does it include the length? Is it null terminated? Both?
-
http://www.cplusplus.com/forum/general/46247/ – Krishnabhadra Jul 30 '13 at 06:04
3 Answers
Does it include the length
Yes. It's required by the C++11 standard.
§ 21.4.4
size_type size() const noexcept;
1. Returns: A count of the number of char-like objects currently in the string.
2. Complexity: constant time.
Note however, that this is unaware of unicode.
Is it null terminated
Yes. It's also required by the C++11 standard that std::basic_string::c_str
returns a valid pointer for the range of [0,size()] in which my_string[my_string.size()]
will be valid, hence a null character.
§ 21.4.7.1
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1. Returns: A pointerp
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
2. Complexity: constant time.
3. Requires: The program shall not alter any of the values stored in the character array.

- 20,807
- 5
- 72
- 86
-
Well I learned C# by reading the spec, I might as well do the same for C++. Where can I get a copy of it? – Jonathan Allen Jul 30 '13 at 07:15
-
2@JonathanAllen I wouldn't learn from the standard, it's full of standardese so it'd be hard to read. However, you can find the C++14 CD [here](http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3690.pdf), along with multiple drafts and the standardisation process. – Rapptz Jul 30 '13 at 07:18
-
@JonathanAllen The best cost-free option is [open-std.org](http://www.open-std.org/JTC1/SC22/WG21/); you can't get the official version there, but you can get the latest draft which only differs in print-styling. It takes some searching to find which one is the newest-but-older-than-published, though. – Angew is no longer proud of SO Jul 30 '13 at 07:24
-
1I don't know what you are comparing it to, but what you just posted is a heck of a lot easier to understand than the documentation on MSDN. – Jonathan Allen Jul 30 '13 at 07:26
-
2@JonathanAllen If you want a good human-readable documentation I definitely support [this one](http://en.cppreference.com/w/Main_Page) – Rapptz Jul 30 '13 at 07:28
-
1Not bad for a quick reference, but when learning something for the first time I prefer to read a book. Especially one that covers all of the nasty details. – Jonathan Allen Jul 30 '13 at 07:33
-
@JonathanAllen http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list has a list of books for learning C++. – Rapptz Jul 30 '13 at 07:34
-
There is no need for a string to have a null terminator until you call c_str(). – DanielKO Jul 30 '13 at 15:32
-
@DanielKO, how is c_str going to return a pointer to a null terminated string in constant time if that string doesn't already exist? It can't add a null terminator when c_str is called and it can't copy the original value in constant time. Therefore wstring must be null terminated internally. – Jonathan Allen Jul 30 '13 at 17:11
-
-
Rule 3, "The program shall not alter any of the values stored in the character array." (Plus that's just silly. What would the buffers be initialized to? Zeros.) – Jonathan Allen Jul 30 '13 at 17:15
-
You are misreading rule 3. It says you, the user of the c_str() method, are not allowed to change anything in there (as in, can't const_cast it to charT*). And it's not silly, you might never need c_str() in a pure C++ program, so why waste time writing the null terminator? In particular, if you have SSO, and all your strings are 1 or 2 characters long. – DanielKO Jul 31 '13 at 19:36
-
The standard doesn't require that the length be stored as a field. For example, if you just stored `wchar_t` head and tail pointers and took the difference, you could get the length in constant time without having to store the length – SheetJS Aug 20 '13 at 14:36
-
@Nirk: The length is still included as part of the information in the structure, even if that information is encoded as the difference between a pair of pointers. What it's illegal for an implementation to do is, say, use a null-terminated string and then use `strlen`. – Puppy Aug 20 '13 at 18:27
We don't know. It's completely up to the implementation. (At least up until C++03 - apparently C++11 requires the internal buffer to be 0-terminated.) You can have a look at the source code of the C++ standard library implementation if the one you are using is opensource.
Apart from that, I'd find it logical if it was NUL-terminated and it stored an explicit length as well. This is good because then it takes constant time to return the length and a valid C string:
size_t length()
{
return m_length;
}
const wchar_t *c_str()
{
return m_cstr;
}
If it didn't store an explicit length, then size()
would have to count the characters up to the NUL
in O(n)
, which is wasteful if you can avoid it.
If, however, the internal buffer wasn't NUL-terminated, but it only stored the length, then it would be tedious to create a proper NUL-terminated C string: the string would have to either reallocate its storage and append the 0 (and reallocation is an expensive operation), or it would have to copy the entire buffer over, which is again an O(n)
operation.
(Warning: shameless self-promotion - in a C language project I am currently working on, I've taken exactly this approach to implement flexible string objects.)
-
4It is [guaranteed to be null-terminated in c++11](http://stackoverflow.com/questions/6077189/will-stdstring-always-be-null-terminated-in-c11). – Jesse Good Jul 30 '13 at 06:39
basic_string (from which wstring is typedef) has no need for terminators.
Yes, it manages its own lengths.
If you need a null-terminated (aka C string) version of string/wstring, call c_str(). But it can contain a null character inside it, in which case pretty much every C function to handle C strings will fail to see the entire string.

- 4,422
- 19
- 29
-
1I'm afraid this doesn't answer the question. OP is asking about the **internal implementation** of the string, he presumably is very well aware of the `.c_str()` member function and knows why and when to use it. Also, I hope you know about the wide-string handling functions in the C standard library, such as `wstrlen()`. – Jul 30 '13 at 06:27
-
Actually I'm a journalist trying to write about how Platform::StringReference works in conjunction with wchar_t* and wstring. Apparently StringReference "requires a null terminated string of type (wchar_t* or wstring)" to work without creating a copy. Or perhaps he means "requires a (null terminated string of type wchar_t*) or wstring". Too bad spoken words don't have parens. – Jonathan Allen Jul 30 '13 at 07:13
-
1Yes, I so didn't answer his question, the chosen answer gave the exact same three answers as me, one hour later. Maybe I should just write a long prose without addressing the question, or just quote the standard while misinterpreting it. – DanielKO Jul 30 '13 at 15:27