6

It's been quite some time now that I've been coding in C++ and I think most who actually code in C++, would agree that one of the most trickiest decisions is to choose from an almost dizzying number of string types available. I mostly prefer ATL Cstring for its ease of use and features, but would like a comparative study of the available options. I've checked out SO and haven't found any content which assists one choosing the right string. There are websites which state conversions from one string to another, but thats not what we want here.

Would love to have a comparison based on specialty, performance, portability (Windows, Mac, Linux/Unix, etc), ease of use/features, multi language support(Unicode/MBCS), cons (if any), and any other special cases.

I'm listing out the strings that I've encountered so far. I believe, there would be more, so we may edit this later to accommodate other options. Mind you, I've worked mostly on Windows, so the list reflects the same:

  1. char*
  2. std::string
  3. STL's basic_string
  4. ATL's CString
  5. MFC's CString
  6. BSTR
  7. _bstr_t
  8. CComBstr
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Samrat Patil
  • 788
  • 5
  • 23
  • Use them all. You'll end up with them all in there anyway, so don't bother trying to be picky at the start – Will Dean Nov 10 '10 at 08:59
  • 1
    Use `CComBstr` only when you work with COM. – J-16 SDiZ Nov 10 '10 at 09:02
  • You could add ICU strings in the mix for proper unicode handling. – Matthieu M. Nov 10 '10 at 09:03
  • Use whatever is best for the task? I'm sure they all have merits and downfalls. One thing I have noticed around here, though, is anyone posting questions using char*, gets at least one blanket answer of "use std::strings instead". That seems to be the go-to solution. – badgerr Nov 10 '10 at 09:04
  • @badgerr - Use whatever is best for the task? That's what I've asked this question to find out. Probably we all will. Thanks. – Samrat Patil Nov 10 '10 at 09:07
  • @Matthieu - ICU strings? – Samrat Patil Nov 10 '10 at 09:18
  • @Samrat: http://site.icu-project.org/ --> International Components for Unicode. Perhaps the most widely deployed library to deal with Unicode, internationalized and localized strings. – Matthieu M. Nov 10 '10 at 16:26

5 Answers5

7

Don't mean to put a dampener on your enthusiasm for this, but realistically it's inefficient to mix a lot of string types in the one project, so the larger the project gets the more inevitably it should settle on std::string (which is a typedef to an instantiation of STL's basic_string for type char, not a different entity), given that's the only Standard value-semantic option. char* is ok mainly for fixed sized strings (e.g. string literals, fixed size buffers) or interfacing with C.

Why do I say it's inefficient? You end up with needless template instantiations for the variety of string arguments (permutations even for multiple arguments). You find yourself calling functions that want to load a result into a string&, then have to call .c_str() on that and construct some other type, doing redundant memory allocation. Even const std::string& requires a string temporary if called using an ASCIIZ char* (e.g. to some other string type's buffer). When you want to write a function to handle the type of string a particular caller wants to use, you're pushed towards templates and therefore inline code, longer compile times and recompilation depedencies (there are some ways to mitigate this, but they get complex and to be convenient or automated they tend to require changes to the various string types - e.g. casting operator or member function returning some common interface/proxy object).

Projects may need to use non-Standard string types to interact with libraries they want to use, but you want to minimise that and limit the pervasiveness if possible.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • For more information on `std::string` and unicode, look [here](http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring). – Björn Pollex Nov 10 '10 at 09:05
2

The sorry story of C++ string handling is too depressing for me to write an essay on, but just a couple of points:

  • ATL and MFC CString are the same thing (same code and everything). They were merged years ago.

  • If you're using either _bstr_t or CComBstr, you probably wouldn't use BSTR except on calls into other people's APIs which take BSTR.

Will Dean
  • 39,055
  • 11
  • 90
  • 118
2
  • char* - fast, features include those that are in < cstring > header, error-prone (too low-level)

  • std::string - this is actually a typedef for std::basic_string<char, char_traits<char> > A beautiful thing - first of all, it's fast too. Second, you can use all the < algorithm >s because basic_string provides iterators. For wide-character support there is another typedef, wstring which is, std::basic_string<wchar_t, char_traits<wchar_t> >. This (basic_string)is a standard type therefore is absolutely portable. I'd go with this one.

  • ATL's and MFC's CStrings do not even provide iterators, therefore they are an abomination for me, because they are a class-wrapper around c-strings and they are very badly designed. IMHO

  • don't know about the rest.

HOpe this partial information helps

Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
1

Obviously, only the first three are portable, so they should be preferred in most cases. If you're doing C++, then you should avoid char * in most instances, as raw pointers and arrays are error-prone. Interfacing with low-level C, such as in system calls or drivers, is the exception. std:string should be preferred by default, IMHO, because it meshes so nicely with the rest of the STL.

Again, IMHO, if you need to work with e.g. MFC, you should work with everything as std::string in your business logic, and translate to and from CString when you hit the WinApi functions.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
1

2 and 3 are the same. 4 and 5 are the same, too. 7 and 8 are wrappers of 6. So, arguably, the list contains just C's strings, standard C++'s strings, Microsoft's C++ strings, and Microsoft's COM strings. That gives you the answer: in standard C++, use standard C++ strings (std::string)

MSalters
  • 173,980
  • 10
  • 155
  • 350