16

GCC supports -fshort-wchar that switches wchar_t from 4, to two bytes.

What is the best way to detect the size of wchar_t at compile time, so I can map it correctly to the appropriate utf-16 or utf-32 type? At least, until c++0x is released and gives us stable utf16_t and utf_32_t typedefs.

#if ?what_goes_here?
  typedef wchar_t Utf32;
  typedef unsigned short Utf16;
#else
  typedef wchar_t Utf16;
  typedef unsigned int Utf32;
#endif
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Chris Becke
  • 34,244
  • 12
  • 79
  • 148
  • 2
    Don't do this. wchar_t has nothing to do with unicode. It is a distinct type which can hold all members of the largest extended character set of all supported locales. If your platform supports only ASCII then sizeof(wchar_t) can be 1. This also means that, for example, that L'mötley crüe' is *not necessarily* an unicode string - it could as well be a Latin-1 string stored with wchar_t. – Nordic Mainframe Feb 02 '11 at 11:59
  • 11
    That is the most universally unhelpful comment ever. On the basis of that advice we should never attempt to deal with a Utf encoded string until C++0x is universally released. In the meantime, I need a set of typedefs, for the platforms I support, that map to the most appropriate distinct types that can hold the data required. – Chris Becke Feb 02 '11 at 13:11

6 Answers6

14

You can use the macros

__WCHAR_MAX__
__WCHAR_TYPE__

They are defined by gcc. You can check their value with echo "" | gcc -E - -dM

As the value of __WCHAR_TYPE__ can vary from int to short unsigned int or long int, the best for your test is IMHO to check if __WCHAR_MAX__ is above 2^16.

#if __WCHAR_MAX__ > 0x10000
  typedef ...
#endif
Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
  • 1
    Im marking this as the answer, as it is the closest to what I was looking for. The template magic in the other answer does seem an even more clever way to support more platforms without knowing lots of platform specific macro's – Chris Becke Feb 02 '11 at 13:16
13
template<int>
struct blah;

template<>
struct blah<4> {
  typedef wchar_t Utf32;
  typedef unsigned short Utf16;
};

template<>
struct blah<2> {
  typedef wchar_t Utf16;
  typedef unsigned int Utf32;
};

typedef blah<sizeof(wchar_t)>::Utf16 Utf16;
typedef blah<sizeof(wchar_t)>::Utf32 Utf32;
Fred Nurk
  • 13,952
  • 4
  • 37
  • 63
  • 1
    Why would you assume that an unsigned short is 2 bytes wide and an unsigned int 4 bytes, and then not simply unconditionally typedef them? You're using your assumptions halfheartedly ... – etarion Feb 02 '11 at 11:11
  • @etarion: I simply answered the question. Wchar_t is a distinct type in C++ (I can't recall for C) and the OP (apparently) wants to use it. – Fred Nurk Feb 02 '11 at 11:25
  • This is a hugely clever way of using c++ to avoid #ifdef magic. That said, it does pollute the global namespace. – Chris Becke Feb 02 '11 at 13:14
  • 3
    @ChrisBecke: You can put blah (or utf_types :P) in a "detail" namespace, similar to how Boost hides implementation details. Hopefully the whole thing (including the last Utf16/32 typedefs) are also wrapped in a namespace for your project. – Fred Nurk Feb 02 '11 at 13:21
8

You can use the standard macro: WCHAR_MAX:

#include <wchar.h>
#if WCHAR_MAX > 0xFFFFu
// ...
#endif

WCHAR_MAX Macro was defined by ISO C and ISO C++ standard (see: ISO/IEC 9899 - 7.18.3 Limits of other integer types and ISO/IEC 14882 - C.2), so you could use it safely on almost all compilers.

Incnis Mrsi
  • 105
  • 4
ASBai
  • 724
  • 2
  • 7
  • 17
  • 1
    if `WCHAR_MAX` is defined in the ISO standards, you can use it safely on *all* compilers (since anything that doesn't define `WCHAR_MAX`, is technically neither a C nor a C++ compiler). – Clearer Jan 08 '19 at 10:35
4

The size depends on the compiler flag -fshort-wchar:

g++ -E -dD -fshort-wchar -xc++ /dev/null | grep WCHAR
#define __WCHAR_TYPE__ short unsigned int
#define __WCHAR_MAX__ 0xffff
#define __WCHAR_MIN__ 0
#define __WCHAR_UNSIGNED__ 1
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_WCHAR_T__ 2
#define __ARM_SIZEOF_WCHAR_T 4
Xypron
  • 2,215
  • 1
  • 12
  • 24
2
$ g++ -E -dD -xc++ /dev/null | grep WCHAR
#define __WCHAR_TYPE__ int
#define __WCHAR_MAX__ 2147483647
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_WCHAR_T__ 4
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
2

As Luther Blissett said, wchar_t exists independently from Unicode - they are two different things.

If you are really talking about UTF-16 - be aware that there are unicode characters which map to two 16-bit words (U+10000..U+10FFFF, although these are rarely used in western countries/languages).

sstn
  • 3,050
  • 19
  • 32