Spanish characters in C++ Windows/Mac/iOS

Question

I'm having some issues with spanish characters displaying in an iOS app. The code in question is all C++, and shared between both a Windows app and an iOS app. Compiled in Windows using Visual Studio 2010 (character set is Multi-byte). And Compiled using Xcode 4.2 on the Mac.

Currently, the code is using char pointers, and my first thought was that I need to switch over to wchar_t pointers instead. However, I noticed that the Spanish characters I want to output display just fine in Windows using just char pointers. This made me think those characters are a part of the multi-byte character set and I don't need to go to all the trouble of updating everything to wchar_t until I'm ready to do some Japanese, Russian, Arabic, etc. translations.

Unfortunatly, while the Spanish characters do display property in the Windows app, they do not display right once they hit the Mac/iOS. Experimenting with wchar_t there, I see that they will display properly if I convert everything over. But what I don't understand, and hoping someone can enlighten me as to the reason... is why are the characters perfectly valid on the Windows machine, same code, and dislaying as gibberish (requiring wchar_t instead) in the Mac environment?

Is visual studio doing something to my char pointers behind the scenes that the Mac is not doing? In other words, is the Microsoft environment simply being more forgiving to my architectural oversight when I used char pointers instead of wchar_t?

Seeing as how I already know my answer is to convert from char pointers to wchar_t pointers, my real question then is "Why does the Mac require wchar_t but in Windows I can use char for the same characters?"

Thanks.

I didn't think Spanish used any special characters aside from `Ñ`/`ñ`, which fit in ASCII and so Windows might be clever enough to char-squeeze. — Toomai, Dec 05 '11 at 19:53
Spanish also uses the accent acute (´) over various letters. — , Dec 05 '11 at 20:05
You can use std::wstring instead of std::string to encapsulate the C stuff like char/wchar_t pointers. Deleted my answer because it didn't answer your question. — Andrew Rasmussen, Dec 05 '11 at 20:16

score 3 · Accepted Answer · answered Dec 05 '11 at 19:54

Mac and Windows use different codepages--they both have Spanish characters available, but they show up as different character values, so the same bytes will appear differently on each platform.

The best way to deal with localization in a cross-platform codebase is UTF8. UTF8 is supported natively in NSString -stringWithUTF8String: and in Windows Unicode applications by calling MultiByteToWideChar with CP_UTF8. In fact, since it's Unicode, you can even use the same technique to handle more complicated languages like Chinese.

Don't use wide characters in cross-platform code if you can help it. This gets complicated because wchar_t is actually 32 bits wide on OS X. In fact, it's wasteful of memory for that reason as well.

http://en.wikipedia.org/wiki/UTF-8

score 2 · Answer 2 · edited May 23 '17 at 10:34

None of char, wchar_t, string or wstring have any encoding attached to them. They just contain whatever binary soup your compiler decides to interpret the source files as. You have three variables that could be off:

What your code contains (in the actual file, between the '"' characters, on a binary level).
What your compiler thinks this is. For example, you may have a UTF-8 source file, but the compiler could turn wchar_t[] literals into proper UCS-4. (I wish MSVC 2010 could do this, but as far as I know, it does not support UTF-8 at all.)
What your rendering API expects. On Windows, this is usually Little-Endian UTF-16 (as a LPWCHAR pointer). For the old LPCHAR APIs, it is usually the "current codepage", which could be anything as far as I know. iOS and Mac OS use UTF-16 internally I think, but they are very explicit about what they accept and return.

No class or encoding can help you if there is a mismatch between any of these.

In an IDE like Xcode or Eclipse, you can see the encoding of a file in its property sheet. In Xcode 4, this is the right-most pane, bring it up with cmd+alt+0 if it's hidden. If the characters look right in the code editor, the encoding is correct. A first step is to make sure that both Xcode and MSVC are interpreting the same source files the same way. Then you need to figure what they are turned in into memory right before rendering. And then you need to ensure that both rendering APIs expect the same character set at all.

Or, just move your strings into text files separate from your source code, and in a well-defined encoding. UTF-8 is great for this, but everything will work that can encode all necessary characters. Then only translate your strings for rendering (if necessary).

I just saw this answer which gives even more reasons for the latter option: https://stackoverflow.com/a/1866668/401925

Spanish characters in C++ Windows/Mac/iOS

2 Answers2