Preliminary Note: Unicode and Unicode character in this answer, given the context of the question itself, refers to the UCS-2 (up to XP) and UTF-16 (starting with XP) encodings, used interchangeably with wide character, wchar_t
, WCHAR
and other terms in the context of the Win32 API. The Unicode standards offer multiple encodings such as UTF-8, UTF-16 and UTF-32 to encode the same number of characters - different incarnations of the standard have a different scope. Surrogate code points are used to escape from the Basic Multilingual Plane (BMP), roughly the first 64K code points, and thus encode more than could be encoded with 16bit characters and one character per code-point. The surrogate extensions were developed for the Unicode 2.0 standard, which was passed in the year NT 4.0 was released, but some years after the first "Unicode-capable" version of Windows, NT 3.51, got released. That original standard didn't account for more characters than the BMP and that is why Unicode character or wide character are even now used synonymous with Unicode in the Win32 API context, although this is inaccurate.
To answer the underlying question you raised:
Are wchar_t
strings simply not comparable using
the "==" operator?
No they aren't, neither are "ANSI" strings, i.e. using the char
type as the basis. Remember, a C string (both wchar_t
and char
based) is a pointer. This means with ==
you were comparing two pointer values that were definitely not equal. One, after all, is a literal string (i.e. inside your program image) while the other is allocated somewhere on the heap. So they are definitely two different entities.
If you wanted to use the ==
you would have to use a language such as C++ with the STL class std::string
(or std::basic_string<_TCHAR>
) or (on Windows) the ATL class CString
(or rather CStringT
). These classes are sometimes referred to as smart string classes and use the C++ facility of overriding the operator==()
. However, you should keep in mind that semantics differ depending on implementation, so not every smart string class will compare the string contents. Some might merely compare the equality of this
(i.e. is it the same instance), while others may compare the string contents case-insensitive or case-sensitive at their discretion.
To compare C strings you have the following functions available for your use-case:
- For "ANSI" character (
char
) strings: strcmp
, _stricmp
(and the "counted" variants: _strncmp
, _strnicmp
... there are more)
- For Unicode character (
wchar_t
) strings: wcscmp
, _wcsicmp
(and the "counted" variants: _wcsncmp
, _wcsnicmp
... there are more)
- For the variable character"type" (
TCHAR
) strings: _tcscmp
, _tcsicmp
(and the "counted" variants: _tcsncmp
, _tcsnicmp
... there are more)
You can remember these prefixes:
str
-> string
wcs
-> wide character string
tcs
-> T character string
Side note: with #include <tchar.h>
and windows.h
the macros TEXT
and _T
are equivalent and used to declare a string literal that will either be "ANSI" or Unicode depending on the defines at build-time. The same holds for _TCHAR
and TCHAR
apparently, whereas the latter appears to be favored in the Win32 API context.
So a Unicode build will expand _T("something")
to L"something"
, while the "ANSI" build will expand it to "something"
.
As to TCHAR, consider reading through the arguments put forth in: Is TCHAR still relevant? (pointed out by rubenvb) There are valid points for and against TCHAR
/_TCHAR
use and you should make a decision and stick with it - i.e. be consistent.