13

Python makes it easy to pad and align ascii strings, like so:

>>> print "%20s and stuff" % ("test")
                test and stuff
>>> print "{:>20} and stuff".format("test")
                test and stuff

But how can I properly pad and align unicode strings containing special characters? I've tried several methods, but none of them seem to work:

#!/usr/bin/env python
# -*- coding: utf-8 -*- 

def manual(data):
    for s in data:
        size = len(s)
        print ' ' * (20 - size) + s + " stuff"

def with_format(data):
    for s in data:
        print " {:>20} stuff".format(s) 

def with_oldstyle(data):   
    for s in data:
        print "%20s stuff" % (s)

if __name__ == "__main__":
    data = ("xTest1x", "ツTestツ", "♠️ Test ♠️", "~Test2~")
    data_utf8 = map(lambda s: s.decode("utf8"), data)

    print "with_format"
    with_format(data)
    print "with_oldstyle"
    with_oldstyle(data)
    print "with_oldstyle utf8"
    with_oldstyle(data_utf8)
    print "manual:"
    manual(data)
    print "manual utf8:"
    manual(data_utf8)

This gives varied output:

with_format
              xTest1x stuff
           ツTestツ stuff
   ♠️ Test ♠️ stuff
              ~Test2~ stuff
with_oldstyle
             xTest1x stuff
          ツTestツ stuff
  ♠️ Test ♠️ stuff
             ~Test2~ stuff
with_oldstyle utf8
             xTest1x stuff
              ツTestツ stuff
          ♠️ Test ♠️ stuff
             ~Test2~ stuff
manual:
             xTest1x stuff
          ツTestツ stuff
  ♠️ Test ♠️ stuff
             ~Test2~ stuff
manual utf8:
             xTest1x stuff
              ツTestツ stuff
          ♠️ Test ♠️ stuff
             ~Test2~ stuff

This is using Python 2.7.

camomilk
  • 763
  • 1
  • 7
  • 15
  • 2
    I think `data_utf8` better be renamed to `data_unicode` as it contains the latter. – robyschek May 19 '16 at 22:52
  • 1
    it is maybe related to this question: http://stackoverflow.com/questions/4622357/how-to-control-padding-of-unicode-string-containing-east-asia-characters – Jacques Gaudin May 19 '16 at 23:34
  • 1
    You may be interested in the Unicode Standard's concept of a ["grapheme cluster"](http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries), which roughly corresponds to the characters perceived by a user reading a piece of text, and third-party modules for computing grapheme clusters, like [`uniseg.graphemecluster`](http://uniseg-python.readthedocs.io/en/latest/graphemecluster.html). You might want to do some sort of additional handling for zero-width characters, though, and of course for non-monospaced fonts, padding would work very differently. – user2357112 May 19 '16 at 23:34
  • You generally can't because that text might be rendered with different fonts, and you don't know which width every character will have. Eg. on my box `'ツ'` is slightly wider than `'a'`. – roeland May 20 '16 at 04:17

1 Answers1

4

There is wcwidth module available via pip.

test.py:

import wcwidth
def manual_wcwidth(data):
    for s in data:
        size = wcwidth.wcswidth(s)
        print ' ' * (20 - size) + s + " stuff"
data = (u"xTest1x", u"ツTestツ", u"♠️ Test ♠️", u"~Test2~")
manual_wcwidth(data)

In the linux console this script yields for me perfectly aligned lines:

console screenshot

However when I run the script in PyCharm the line with kana is still shifted one character left, so this seems also font and renderer dependent:

pycharm screenshot

robyschek
  • 1,995
  • 14
  • 19
  • This works well for our purposes (we are using OS X Terminal to display the text). Thanks for the tip! We should probably note that this requires "pip install wcwidth" – camomilk May 20 '16 at 16:31
  • Japanese and Chinese characters are full width characters and are slightly wider than other characters in most (all?) fonts. About 3 full width characters equals 4 regular characters, so only having two in the string results in a slightly off alignment due to rounding. – Mark Tolonen May 21 '16 at 13:32