6

I have difficulty aligning Japanese characters in python.

Code:

print "{c1}{name:>14s}{c2}{nick_name:>14s}{planes:>16s}".format(
    name=name, nick_name=nick_name, planes=planes, 
    c1=u.color['yellow'], c2=u.color['default']
)

Result: enter image description here

If the string contains english and numbers only, the .format() works fine,as shown on the right.

The aligning goes wrong when encountering Japanese characters, as shown on the left.

Interestingly, when aligning with {name:>14s}:

  • If "name" contains 4 JP characters, there would be 2 prefix spaces.
  • If "name" contains 3 JP characters, there would be 5 prefix spaces.
  • If "name" contains 2 JP characters, there would be 8 prefix spaces.
  • If "name" contains 0 JP characters, there would be 14 prefix spaces.

It seems like it treat 1 Japanese charater = 3 spaces in this case.

{name:<14s} {name:^14s} {name:>14s} all have the behavior mentioned above.

I am using OSX 10.10.2, terminal font is monaco.

Maybe this has something to do with full-width/half-width characters.

Is there anyway to align Japanese characters just like English characters?

Thanks you.


Edit:

Ignacio Vazquez-Abrams's answer is indeed the correct way.

  • Everyone who is dealing with unicode in Python should read the slide he pointed out.

  • The "\u3000" is the full-width space in CJK. See this page.

  • Review the .Format Syntax would also help.

  • I would also like to recommend this SO answer which helps me understanding how unicode works in Python.

However, if the string contains both half-width and full-width characters, the alignment still goes wrong. A simple workaround is to use all full-width characters.

enter image description here

Community
  • 1
  • 1
user2875289
  • 2,799
  • 1
  • 22
  • 25

1 Answers1

7

You're performing two goofups simultaneously:

  1. You're using a UTF-8 sequence of bytes instead of a sequence of characters.
  2. You're aligning using half-width spaces.

For the first, use unicodes instead of strs. For the second, use full-width spaces instead.

>>> print '{:>8s}'.format('ありがとう')
ありがとう
>>> print u'{:>8s}'.format(u'ありがとう')
   ありがとう
>>> print u'{:\u3000>8s}'.format(u'ありがとう')
   ありがとう
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks you, this is indeed the correct answer. I have edited my post. Another question came to me. Do you know how to align strings that has both half-width and full-width characters? In theory, it may seems strange to mix both half and full-with character is a string. But in practice, its perfectly normal since we almost never use full-width English and Numbers. :) – user2875289 Apr 23 '15 at 09:40
  • 1
    You will need to iterate over the string counting how many of each there are, and then pad by the appropriate type and number of spaces. – Ignacio Vazquez-Abrams Apr 23 '15 at 13:14
  • I was thinking about the same thing, but wonder if there is a better way. Thanks you again. – user2875289 Apr 23 '15 at 13:36