Determine width in terminal of asian/japanese characters?

Question

In my terminal these are equally wide:

ヌー平行
parallel
æøåüäöûß

same width of "ヌー平行" and "parallel" same width of "ヌ" and "p"

I have managed to get Perl to give the length 8 for the last 2 lines, but it reports the length of the first line as 4. Is there a way for me to determine that the width of ヌ is twice that of ø?

Does this answer your question? [How to determine whether a unicode character is fullwidth or halfwidth in Perl](https://stackoverflow.com/questions/70834053/how-to-determine-whether-a-unicode-character-is-fullwidth-or-halfwidth-in-perl) — Shawn, Mar 07 '23 at 17:25
@mob Does it? All the fixed width fonts I have tried acts the same way. — Ole Tange, Mar 07 '23 at 17:26
@Ole It would be more correct to say that this depends on your terminal's font rendering engine, which often overrides the font's spacing to force fixed-width text. Reasonable terminals will display full-width CJK chars across two columns, but I'm not aware of any standard that would require this. — amon, Mar 07 '23 at 17:27
Relevant standard: https://www.unicode.org/reports/tr11/tr11-40.html — Mark Tolonen, Mar 07 '23 at 17:28
This has nothing to do with UTF-8 or Unicode in particular, but [halfwidth and fullwidth forms](https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms). — AmigoJack, Mar 10 '23 at 11:00

ikegami · Answer 1 · 2023-03-07T18:17:44.860

You can use Text::CharWidth's mbswidth. It uses POSIX's wcwidth.

use v5.14;
use warnings;

use utf8;
use open ':std', ':encoding(UTF-8)';

use Encode             qw( encode_utf8 );
use Text::CharWidth    qw( mbswidth );
use Unicode::Normalize qw( NFC NFD );

my @tests = (
   [ "ASCII",     "parallel",      8 ],
   [ "NFC",       NFC("æøåüäöûß"), 8 ],
   [ "NFD",       NFD("æøåüäöûß"), 8 ],
   [ "EastAsian", "ヌー平行",      8 ],
);

for ( @tests ) {
   my ( $name, $s, $expect ) = @$_;
   my $length = length( $s );
   my $got = mbswidth( encode_utf8( $s ) );
   printf "%-9s length=%2d expect=%d got=%d\n", 
      $name, $length, $expect, $got;
}

ASCII     length= 8 expect=8 got=8
NFC       length= 8 expect=8 got=8
NFD       length=13 expect=8 got=8
EastAsian length= 4 expect=8 got=8

Note that mbswidth expects a string encoded using the locale's encoding, which I assumed was UTF-8 in two places in the above program.

If you want to know the number of column a string should take according to Unicode, this is covered by Unicode Standard Annex #11. Note that the answer may depend on whether one is in an East Asian context or not. For example, U+03A6 GREEK CAPITAL LETTER PHI ("Φ") takes up two columns in an East Asian Context, while it takes up only one otherwise.

`wcwidth` returns -1 for "errors" (non-printable characters), and Text::CharWidth's `mbswidth` doesn't treat that case specially, so you end up with 2 + -1 = 1. You could submit a ticket suggesting alternative behaviour, such as returning `undef`. — ikegami, Mar 08 '23 at 06:13

Determine width in terminal of asian/japanese characters?

1 Answers1