Is there any difference between these two string storage formats?
3 Answers
std::wstring
is a container of wchar_t
. The size of wchar_t
is not specified—Windows compilers tend to use a 16-bit type, Unix compilers a 32-bit type.
UTF-16 is a way of encoding sequences of Unicode code points in sequences of 16-bit integers.
Using Visual Studio, if you use wide character literals (e.g. L"Hello World"
) that contain no characters outside of the BMP, you'll end up with UTF-16, but mostly the two concepts are unrelated. If you use characters outside the BMP, std::wstring
will not translate surrogate pairs into Unicode code points for you, even if wchar_t
is 16 bits.
-
Do you mean that std::wstring is the same with UTF-16 for only the non-BMP unicode character when used in Windows operating system? – hkBattousai Nov 22 '10 at 15:53
-
8No. std::wstring is just a container of integers. The encoding of the container depends entirely on the data you insert into the container. – JoeG Nov 22 '10 at 16:06
-
1
-
1
UTF-16 is a specific Unicode encoding. std::wstring
is a string implementation that uses wchar_t
as its underlying type for storing each character. (In contrast, regular std::string
uses char
).
The encoding used with wchar_t
does not necessarily have to be UTF-16—it could also be UTF-32 for example.

- 18,291
- 25
- 109
- 191

- 310,957
- 84
- 592
- 636
UTF-16 is a concept of text represented in 16-bit elements but an actual textual character may consist of more than one element
std::wstring is just a collection of these elements, and is a class primarily concerned with their storage.
The elements in a wstring, wchar_t is at least 16-bits but could be 32 bits.
-
Can you please explain in more detail, like giving an example. For instance the character 'A' is stored in std::wstring like "0x0041". How is it stored in UTF-16 format? – hkBattousai Nov 22 '10 at 15:50
-
7
-
2@Inverse: That's why everyone should just use ASCII, there wouldn't be so much grief on memory use ;) – Matthieu M. Nov 22 '10 at 16:36
-
1For those who may not understand the humor in the above comments, [UTF-16](https://en.wikipedia.org/wiki/UTF-16) is a 16-***bit*** Unicode encoding. Also, in UTF-16, a character that is defined using more than one 16-bit element is done so via [surrogate pairs](https://en.wikipedia.org/wiki/UTF-16#U.2B10000_to_U.2B10FFFF). – DavidRR Apr 27 '15 at 13:59