Using strings in C++ development is
always a bit more complicated than in
languages like Java or scripting
languages. I think some of the
complexity comes from a performance
focus in C++ and some is just
historical.
I'd say it's all historical. In particular, two pieces of history:
- C was developed back in the days when everyone (even Japan) was using a 7-bit or 8-bit character encoding. Because of this, the concepts of
char
and "byte" are hopelessly confounded.
- C++ programmers quickly recognized the desirability of having a string class rather than just raw
char*
. Unfortunately, they had to wait 15 years for one to be officially standardized. In the meantime, people wrote their own string classes that we're still stuck with today.
Anyhow, I've used two of the classes you mentioned:
MFC CString
MSDN documentation
There are actually two CString
classes: CStringA
uses char
with "ANSI" encoding, and CStringW
uses wchar_t
with UTF-16 encoding. CString
is a typedef of one of them depending on a preprocessor macro. (Lots of things in Windows come in "ANSI" and "Unicode" versions.)
You could use UTF-8 for the char
-based version, but this has the problem that Microsoft refuses to support "UTF-8" as an ANSI code page. Thus, functions like Trim(const char* pszTargets)
, which depend on being able to recognize character boundaries, won't work correctly if you use them with non-ASCII characters.
Since UTF-16 is natively supported, you'll probably prefer the wchar_t
-based version.
Both CString classes have a fairly convenient interface, including a printf-like Format
function. Plus the ability to pass CString objects to this varags function, due to the way the class is implemented.
The main disadvantages are:
- Slow performance for very large strings. (Last I checked, anyway.)
- Lack of integration with the C++ standard library. No iterators, not even
<<
and >>
for streams.
- It's Windows-only.
(That last point has caused me much frustration since I got put in charge of porting our code to Linux. Our company wrote our own string class that's a clone of CString but cross-platform.)
std::basic_string
The good thing about basic_string
is that it's the standard.
The bad thing about it is that it doesn't have Unicode support. OTOH, it doesn't actively not support Unicode, as it lacks member functions like upper()
/ lower()
that would depend on the character encoding. In that sense, it's really more of a "dynamic array of code units" than a "string".
There are libraries that let you use std::string
with UTF-8, such as the above-mentioned UTF8-CPP and some of the functions in the Poco library.
For which size characters to use, see std::wstring vs std::string.