9

How to check if a unicode character is full width?

I use Win32 / MFC

For example, is full width, A is not full width, is full width, F is not full width.

linquize
  • 19,828
  • 10
  • 59
  • 83

2 Answers2

8

What you need is to retrieve the East Asian Width of the character. You can do it by parsing the EastAsianWidth.txt file from the Unicode Character Database. I could not find a Win32 API that returns this info, but in Python, for example, you can use unicodedata.east_asian_width(unichr).

See the Annex #11 for the background of the problem and more information.

Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • This is the correct answer. FWIW: the various files from the Unicode consortium have been designed for easy parsing, so it shouldn't be too difficult to machine generate a C++ table from it. (I've done this for a number of other such files.) – James Kanze Dec 18 '13 at 16:30
  • Are there any other languages other than East Asian have full width characters? – linquize Dec 20 '13 at 02:04
  • For a more complete discussion, see this answer: http://stackoverflow.com/a/9145712/53974 – Blaisorblade May 17 '14 at 14:34
-4

What do you mean by "full width"? The width of a character depends on the font it is being displayed in.

If you mean whether it is a single byte character or not, it's still not clear. A single byte character in what encoding? In UTF-8, it will be a single byte character if (and only if) the code point is less than 128; if you're using UTF-16 (probable, since you're under Windows), just compare the character with 128. A single byte encoding in ISO 8859-1 (another wide spread encoding): compare with 256. For anything less than 256, the UTF-16 unit will be numerically identical to the code point in ISO 8859-1 (sometimes known as Latin-1). For the single byte encoding ASCII (almost never used today, but most of the common encodings are identical with it for the first 128 code points), anything less that 128 is good.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • @Roddy That makes more sense. I should have looked up his second full-width character in my Unicode encoding. (Of course, it basically means that there isn't a simple answer.) – James Kanze Dec 18 '13 at 16:00