Char and string types, what should I use?

Question

At work I'm mainly using C# but I want to learn C++ as well and chars/strings are somewhat confusing. For example, I know that TCHAR can either be a regular char or, if I'm using Unicode, a wchar_t. So, always use TCHAR? But then you find questions like Is TCHAR still relevant?.
Ok, let's use something else...? So far I've only used char and std::string and at this point I have no idea if that was a good approach or not, I'm a bit lost.

What should I use if I'm writing a program which will not be translated to antother language?
What should I use if I'm writing a program which will be translated to antother language?
What should I use if I'm writing a program which will only be used in countries that use Latin characters, which do not have English as their native language (hello ä, ö, ü, ß, æ, Ø, ï...)?
Is there anthing that I can safely ignore because it's outdated?

I suggest reading [`std::string` vs. `std::wstring`](http://stackoverflow.com/a/402918/5470596). — YSC, Mar 17 '16 at 09:03
If you ask me: Always use `char` (and `std::string`) and UTF-8. *Always*. http://utf8everywhere.org is a good read. Moreover, thinking that sticking with English means ASCII (or its set of characters) will be enough is a bit naïve. — Biffen, Mar 17 '16 at 09:31
Use wide strings (TCHAR with _UNICODE and UNICODE defined, or WCHAR) everywhere, all the time. Prefer std::wstring when writing C++ code. The only exception is if you want to use UTF-8. http://utf8everywhere.org hasn't managed to sell me on it yet, but it makes fair arguments. — Cody Gray - on strike, Mar 17 '16 at 09:39

score 0 · Answer 1 · answered Mar 17 '16 at 08:58

So, always use TCHAR?

Not really recomended as this is windows only macro. But if you will plan to use it on other platform then its easy to define your own TCHAR. Personally I always use TCHAR - as application I work on started as a Windows only project.

sing Unicode, a wchar_t. So, always use TCHAR? But then you find questions like Is TCHAR still relevant?. Ok then, let's use something else...?

By default Visual Studio will create project with UNICODE macro defined, this means all Win Api functions will accept WCHAR strings, this also means that TCHAR will resolve to WCHAR -> wchar_t. So if you work strictly only with Windows UI then its better for you to use wchar_t and std::wstring.

What should I use if I'm writing a program which will not be translated to antother language? What should I use if I'm writing a program which will be translated to antother language? ...

why do you assume it will not be? I would suggest you to prepare code for the worst scenerio, where your code will have to accept Chineese symbols. So if you have texts in some resources, then keep them in UTF8, then in your c++ code use char strings to manage them. When you need to show them using some windows API then convert them to wchar_t. Write portable code, ie. a backend that is not using any of TCHAR, WCHAR - and frontend which will communicate with platform APIs like MFC or WinAPI or QT...

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

Heavily opinionated but based on experience answer

Before I begin, let me state I've been working on C++ software for five years, with millions of users globally - in doing so I've learned a hell-of-a-lot about how things work in the real world.

The first thing to understand is windows inherently uses it's (originally homegrown) UTF-16 standard (aka, wide-char). And in doing so makes your life much, much harder. (almost) every other operating system uses UTF-8. And by that I mean; OS X, *NIX, Android, Ios, pretty much anything you can throw a c++ compiler at.

because of this, do you EVER intend to use your code outside of windows? If you don't, there's no reason not to do it the "windows way", std::wstring being your best-friend here. You can very easily use .c_str() to get a const wchar_t * (and that implicitly converts into a LPCWSTR). Many of these windows types (such as LPCWSTR, and TCHAR, are actually Macros (aka #define) You can read more on that here.

should you bother with UTF-16 wide characters at all?
It's very very to think "what if I ignore languages that don't use a latin alphabet", trust me when I say, don't. Yes, you could use Multibyte characters only, or implicitly call only the A variants of API functions. However, while this works (and very well), If you support any language beyond Latin-types, you will run into problems. And even if you don't, users will expect to input in their native language.

TL;dr

English Only, Cross Platform?- In short, There is nothing inherently wrong with using only Ansi 8-bit strings all over windows-programming - it won't crash the internet, and if you writing something that you know for certain is only going to be used by English-speakers across platforms (software for america?) then I actually recommend changing your project to Multi-Byte, and using std::string for everything, just don't expect to open A single file with a international filename.
And keep that in mind, if your user-base is in the thousands go utf-8, if its in the tens of thousands, people are going to get mildly angered by not being able to load kanjii-filenames.

International, Windows only - If your software is going to come even close to approaching the internet-border-of-Sweden (where it needs to load a file-name written in Goa'uld), Use std::wstring, use UTF-16, and be happy in windows-only software. To be honest, this is the state of most windows software today.

International, Mac's are cool? - Your project manager wants cross-platform software yesterday, it needs to run on Mac and PC - because the users it's being deployed to are 16% mac users (according to marketing), and it needs to have IME support for Arabic and Japanese.
Tell your project manager you are going to write a wrapper for all your API-calls, it will take a week longer, but will prevent any cross-platform language nonsense, if he doesn't agree - quit.

Then do just that, Use UTF-8 under the bonnet, and have any API-calls to the windows / mac system handled using a wrapper-class you wrote yourself. Yes it will take some effort and maintenance, but it lets you save a lot of time in the long run.

EXTRA LINKS

If you need very complex unicode support, check out the ICU library, OSX uses this under the hood!)
Learn to use BOOST - the filesystem support alone makes cross-platform C++ development much, much faster

There are plenty of things wrong with using ANSI 8-bit strings for Windows programming, and I can't think of *any* advantages that it holds over using UTF-16 strings. Oh wait, you save a few bytes. That's hardly worth it, considering the performance penalty you'll pay for having to do the conversion *each* time you call an API function. The ANSI wrappers are deprecated, and have been for a long time. I'm surprised they haven't gone away entirely by now. There certainly are lots of newer functions that don't have ANSI wrappers, available only in Unicode. — Cody Gray - on strike, Mar 17 '16 at 09:37
@CodyGray that is Exactly what I said - Note the part where I Mentioned loading a Chinese filename, and recommended against using utf-8 - though I should make a little edit to the fifth paragraph to clarify. As for advantages of UTF8 - Every other operating system, That's the advantage. — John Bargman, Mar 17 '16 at 09:43
`Oh wait, you save a few bytes` - if you manage large databases with lots of texts (ie. maps) then it makes difference. Also if your app works on some embeded devices where more memory means device will cost more - then it makes a difference. `considering the performance penalty you'll pay for having to do the conversion` - showing texts in UI requires you to fill ie. list or text box, lets say even 100 conversions to show current UI state - is that really that a lot - will it really make a difference if it will take 1ms instead of 100ns? — marcinj, Mar 17 '16 at 09:44
@marcinj , absolutely agreed - admittedly these are (for most developers) Edge cases. But I've seen some amazing database savings from switching to UTF-8 support. That said, for a "windows only" application - UTF16 simplifies things. — John Bargman, Mar 17 '16 at 09:46
John, my comment was directly solely to the "English only" section of your answer. I couldn't disagree more with that advice. The rest of the advice is sage. marcin, there aren't very many memory-constrained embedded devices running Windows. You're right, you can choose any format you want for data persistence, but when storing strings in memory and interacting with the operating system, it makes little sense not to speak its preferred dialect. You have to have a rather compelling reason to go against the grain, and saving a few bytes doesn't seem like a good one. — Cody Gray - on strike, Mar 17 '16 at 09:46
@CodyGray , I realised I had totally forgotten to specify that was supposed to be "english only, cross platform", I've corrected my error. — John Bargman, Mar 17 '16 at 09:49
@JohnBargman I learned the hard way that assuming that application will always be windows only can cause lots of problems. — marcinj, Mar 17 '16 at 09:54
@marcinj , as have I, but I must present the option, it's the OP's decision methinks? — John Bargman, Mar 17 '16 at 10:01
At the moment I'm not using C++ very much for productive work (just for some very small scripts) and **right now** I have no intentions of using it outside of Windows 7/8/10. I also don't have any plans for translating my programs to Chinese, but my native language is German and I've noticed that it's not that easy to display characters like `ä, ß, ü` etc. in C++ and wasn't quite sure what to use. C# just does it for you. Thanks to all three of you! — CookedCthulhu, Mar 17 '16 at 13:10

Char and string types, what should I use?

2 Answers2

Heavily opinionated but based on experience answer

TL;dr

EXTRA LINKS