2

I would like to format strings to show them with curses. For that, I would like to add spaces if the line is too short, and cut it and add continuation characters if it is too long.

My problem is to compute the real width of the string in my terminal. In my string, I can have characters that won't be shown and others that will be of length two (and printed as square). Consequently, the function len is not a right candidate.

To solve that, I tried to use unicodedate.category to filter characters:

removed_category = ('Mn', 'So')
string = ''.join(char for char in string \
    if unicodedata.category(char) not in removed_category)

But it does not work for all cases. For instance, 'S' is detected as a uppercase letter but in terminal its width is not one. Moreover, it is slow and not very elegant.

I also tried string.printable to it removes also characters than can be shown in my terminal.

EDIT (since the question is closed)

Solutions based on unicodedata.east_asian_width do not work with zero-width characters. If we combine that with unicodedata.category, it seems to work:

def stringWidth(string):
    width = 0
    for c in string:
        # For zero-width characters
        if unicodedata.category(c)[0] in ('M', 'C'):
            continue
        w = unicodedata.east_asian_width(c)
        if w in ('N', 'Na', 'H', 'A'):
            width += 1
        else:
            width += 2

    return width

Depending on your text, you may need to add other categores for zero-width characters.

rools
  • 1,539
  • 12
  • 21

0 Answers0