0

I have custom print functions I use to print numbers. I made an ASCII version and a UTF-16LE version. The UTF-16LE version uses the Fullwidth codes/characters for 0-9 and A-F for hexadecimal. When debugging my functions I noticed the characters looked a little different in Visual Studio than the ASCII characters, and while this didn't bother me, it got me thinking about it. So I decided to do a quick google search for "Unicode halfwidth vs fullwidth"

... And I found several pages that talk about the "Fullwidth" form referring to the Visual width of the characters, while I thought "Fullwidth" referred to the width of the encoding (2 Bytes or more)...

Here are a few pages and quotes from them:

It doesn't make sense to me that "Fullwidth" would refer to the visual width, when we have different Fonts for size and alignment.

Why does "Fullwidth" refer to the visual width? Where in the Unicode UTF-16 spec does it say this?

Is having the choice to output as Halfwidth or Fullwidth using flags be desirable?

TylerH
  • 20,799
  • 66
  • 75
  • 101
b.sullender
  • 163
  • 1
  • 16
  • 1
    FWIW, half-width kana is specifically discussing Japanese kana. It's unrelated to what you're asking. –  Mar 29 '18 at 23:36
  • What are the exact codepoints for the characters you're discussing? – Mark Ransom Mar 29 '18 at 23:37
  • @MarkRansom U+FF10 - U+FF19 (0-9 Latin), U+FF21 - U+FF26 (A-F Latin), and U+FF41 - U+FF46 (a-f Latin)... Codepoints are still a new term to me, but i think that's what your asking for. – b.sullender Mar 29 '18 at 23:53
  • Have a look at: http://www.fileformat.info/info/unicode/char/ff10/index.htm In this context half and full refer to the graphical representation of the character. You probably want to use this character U+0030 (to U+0039) http://www.fileformat.info/info/unicode/char/0030/index.htm – Richard Critten Mar 30 '18 at 00:30
  • Here's a reason they exist: https://stackoverflow.com/q/4622357/235698 – Mark Tolonen Mar 30 '18 at 05:36

2 Answers2

4

Half-width Kana as you've found is just a subset of Halfwidth and fullwidth forms, and it's a property of the codepoint/glyph, not of the encoding. UTF-16 is one of the encodings for Unicode.

The reason for the existence of those characters is because Unicode was designed for lossless back-and-forth conversion between legacy character sets. If you look closer at the Unicode blocks you'll see there are a lot of redundant characters like Ⅶ Ⅷ Ⅸ ㎆ ㎇ ㎎ ㎏ ㎐ Dz dz NJ.... They're all purely for compatibility purpose because they've been used in some character sets.

See also What issues lead people to use Japanese-specific encodings rather than Unicode?

As a Developer/Programmer, would having the choice to output as Halfwidth or Fullwidth using flags be desirable?

Personally I see no reason for using them except in some rare cases, like displaying characters on a square grid. What's worse is that those Japanese characters are often rendered without cleartype and antialiasing (in small sizes) so it's a pain in the eyes to read. If you're in Japan you'll notice some forms that requires the use of halfwidth or fullwidth characters without automatic conversion, which is bad.

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • I think Dz and some other digraphs would have a currently-valued reason for being: They are letters in some scripts for some language's official alphabets. Of course, they could be decomposed but a letter in an alphabet corresponds well with the concept of a single character. That helps with collations that order them differently than decomposed forms, too. – Tom Blodget Mar 30 '18 at 11:11
  • @TomBlodget but there are so many digraphs that if we have separate characters for them then other languages will ask why they don't have codepoints for that and there'll be constant need to add new ones in the future. That'll also make collation rules much more complex. Spanish have abandoned the use of `ch` as a separate character in the alphabet and so have many languages – phuclv Mar 30 '18 at 11:33
1

You found your own answers to the origination of fullwidth vs. halfwidth so I won't get into that. Yes, the designation refers to the visual width of the characters. Sorry but I don't have any official reference for that.

One of the goals of Unicode is to handle round-trip conversions from/to any legacy character set without loss. Since there are legacy character sets with fullwidth characters, they must also be part of Unicode or they would get converted incorrectly.

I find it hard to imagine a circumstance in modern code where you would want a choice between normal and fullwidth characters. It's really only for legacy support.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622