The width of an East Asian character is described in Annex #11 of the Unicode Standard which talks about the East_Asian_Width
property of a Unicode character.
Although, I could find no way of inquiring this property using standard Java 8 libraries, one can use the ICU4J library (com.ibm.icu.icu4j in Maven) to get this value.
For example, the following code returns UCharacter.EastAsianWidth.WIDE
:
int esw = UCharacter.getIntPropertyValue('あ', UProperty.EAST_ASIAN_WIDTH);
Some testing with Japanese characters has shown that all single-byte Shift JIS kana characters (e.g. halfwidth カ
) are designated HALFWIDTH
, while their fullwidth counterparts (e.g. カ
) are designated FULLWIDTH
. All other fullwidth characters, such as あいうえお
return WIDE
, and non-fullwidth characters such as plain Abc
return NARROW
.
The value AMBIGUOUS
needs some extra care because its widths will vary depending on context. For instance, the vim editor has an ambiwidth
option to let the user choose whether it should be treated narrow or wide, since rendering is terminal dependent.
The aforementioned annex states for ambiguous characters: Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non-East Asian usage.
It also states for NEUTRAL
: Strictly speaking, it makes no sense to talk of narrow and wide for neutral characters, but because for all practical purposes they behave like Na, they are treated as narrow characters (the same as Na) under the recommendations below.
However, I have found the Narrow for NEUTRAL
not always the case, as some characters can appear wide in editors I have tried. Furthermore, ⅶ
, ⅷ
, ⅸ
, ⅹ
are AMBIGUOUS
, while the proceeding characters ⅺ
and ⅻ
are NEUTRAL
and this doesn't seem to make sense. Perhaps characters not mapped in icu4j
fall back to NEUTRAL
.
Lastly, UCharacter.EastAsianWidth.COUNT
is just a constant representing the number of properties defined under UCharacter.EastAsianWidth
, and not a value getIntPropertyValue()
will return.