Difference between char* and wchar_t*

Question

I am new to MFC. I am trying to do simple mfc application and I'm getting confuse in some places. For example, SetWindowText have two api, SetWindowTextA, SetWindowTextW one api takes char * and another one accepts wchar_t *.

What is the use of char * and wchar_t *?

[Unicode](http://msdn.microsoft.com/en-us/library/2dax2h36.aspx). — vanza, Oct 23 '13 at 04:29

mvp · Accepted Answer · 2022-11-30T10:42:39.703

26

char is used for so called ANSI family of functions (typically function name ends with A), or more commonly known as using ASCII character set.

wchar_t is used for new so called Unicode (or Wide) family of functions (typically function name ends with W), which use UTF-16 character set. It is very similar to UCS-2, but not quite it. If character requires more than 2 bytes, it will be converted into 2 composite codepoints, and this can be very confusing.

If you want to convert one to another, it is not really simple task. You will need to use something like MultiByteToWideChar, which requires knowing and providing code page for input ANSI string.

edited Nov 30 '22 at 10:42

answered Oct 23 '13 at 04:34

mvp

111,019
13
122
148

1

You've edited it, but you're missing the point: encoding a single character as two composite code units is the defining difference between UTF-16 and UCS-2. In UTF-16, you can have four-byte characters that are encoded in two halves. In UCS-2, that is not allowed, and **all** UCS-2 characters are two bytes. – Dietrich Epp Oct 23 '13 at 04:39
More nitpicking: `wchar_t` is not used for UTF-16 on all compilers. It contains "some" unicode value, and range of this value on platform. On windows compilers wchar_t is normally 2bytes big and therefore contains UTF-16. However, on linux compiler it can easily be 4 bytes big. – SigTerm Oct 23 '13 at 05:35
**Windows** historically used `wchar_t` as UCS-2, but once the Unicode standard (and UTF-16) became a thing, it switched to treating `wchar_t` as UTF-16. Most of the important ‘W‘ functions treat `wchar_t` properly as UTF-16, but I believe there still exist some that do not. – Dúthomhas Dec 26 '22 at 01:00
**Linux** has historically been all over the place with `wchar_t`, but all modern implementations I am aware of treat `wchar_t` as UTF-32. – Dúthomhas Dec 26 '22 at 01:02

score 4 · Answer 2 · answered Oct 23 '13 at 04:32

4

On Windows, APIs that take char * use the current code page whereas wchar_t * APIs use UTF-16. As a result, you should always use wchar_t on Windows. A recommended way to do this is to:

// Be sure to define this BEFORE including <windows.h>
#define UNICODE 1
#include <windows.h>

When UNICODE is defined, APIs like SetWindowText will be aliased to SetWindowTextW and can therefore be used safely. Without UNICODE, SetWindowText will be aliased to SetWindowTextA and therefore cannot be used without first converting to the current code page.

However, there's no good reason to use wchar_t when you are not calling Windows APIs, since its portable functionality is not useful, and its useful functionality is not portable (wchar_t is UTF-16 only on Windows, on most other platforms it is UTF-32, what a total mess.)

answered Oct 23 '13 at 04:32

Dietrich Epp

205,541
37
345
415

2

Actually, instead of defining UNICODE, you might as well directly call the *W functions, then there's no chance of some file not defining UNICODE by accident. – rubenvb Oct 23 '13 at 07:28
Surely using TCHAR is a better way to go, this way if for whatever reason you do need an ANSI compiled version it is still possible, and it also works with the compiler setting `SetWindowText` to `SetWindowTextW` or `SetWindowTextA` – Luke Oct 29 '13 at 14:34
@Luke: There is **no valid reason** to write new code that does not support Unicode. Avoid `TCHAR`. (Okay, there's maybe a valid reason 0.001% of the time, but that doesn't apply to you.) – Dietrich Epp Oct 29 '13 at 16:26
@Luke: If you use ANSI code pages then you won't be able to open files with filenames that contain characters outside your code page. That can be a huge problem. Also, your program will display paths incorrectly on certain systems: Install path `C:¥Program Files¥Company Name¥My Application`... something look wrong with that picture? It's because you used ANSI instead of Unicode, and the computer has a Japanese locale. **Don't use ANSI.** – Dietrich Epp Oct 29 '13 at 16:29
Is it possible in future the standard adds a different character encoded variable and Microsoft make it possible for TCHAR to be compiled to that? Using TCHAR is equivalent to using `wchar_t` yet also has the added benefit of also being `char` and potentially whatever it is in future. – Luke Oct 30 '13 at 09:40

score 1 · Answer 3 · edited Dec 26 '22 at 00:30

SetWindowTextA takes char*, which is a pointer to ANSI strings.
SetWindowTextW takes wchar_t*, which is a pointer to "wide" strings (Unicode).

SetWindowText has been defined (#define) to either of these in header Windows.h based on the type of application you are building. If you are building a UNICODE build then your code will automatically use SetWindowTextW.

SetWindowTextA is there primarily to support legacy code, which needs to be built as SBCS (Single byte character set).

score -4 · Answer 4 · answered Oct 23 '13 at 04:34

char* : It means that this is a pointer to data of type char.

Example

// Regular char
char aChar = 'a';

// Pointer to char
char* aPointer = new char;
*aPointer = 'a';

// Pointer to an array of 10 chars
char* anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';

// Also a pointer to an array of 10
char[] anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';

wchar_t* : wchar_t is defined such that any locale's char encoding can be converted to a wchar_t representation where every wchar_t represents exactly one codepoint.

Difference between char* and wchar_t*

4 Answers4

Linked