Width of a string with zero-width and two-width characters, in Python 3 in a terminal (not in a GUI)

Question

I would like to format strings to show them with curses. For that, I would like to add spaces if the line is too short, and cut it and add continuation characters if it is too long.

My problem is to compute the real width of the string in my terminal. In my string, I can have characters that won't be shown and others that will be of length two (and printed as square). Consequently, the function len is not a right candidate.

To solve that, I tried to use unicodedate.category to filter characters:

removed_category = ('Mn', 'So')
string = ''.join(char for char in string \
    if unicodedata.category(char) not in removed_category)

But it does not work for all cases. For instance, 'Ｓ' is detected as a uppercase letter but in terminal its width is not one. Moreover, it is slow and not very elegant.

I also tried string.printable to it removes also characters than can be shown in my terminal.

EDIT (since the question is closed)

Solutions based on unicodedata.east_asian_width do not work with zero-width characters. If we combine that with unicodedata.category, it seems to work:

def stringWidth(string):
    width = 0
    for c in string:
        # For zero-width characters
        if unicodedata.category(c)[0] in ('M', 'C'):
            continue
        w = unicodedata.east_asian_width(c)
        if w in ('N', 'Na', 'H', 'A'):
            width += 1
        else:
            width += 2

    return width

Depending on your text, you may need to add other categores for zero-width characters.

Yay, the joy of Unicode double-width characters. And combining marks. And zero-width characters. — Martijn Pieters, Feb 03 '18 at 14:17
I believe this is a very similar question to this: https://stackoverflow.com/questions/2455255/how-to-get-the-width-of-a-string-in-pixels — Olivier Melançon, Feb 03 '18 at 14:17
`unicodedata.east_asian_width()` can help but is not foolproof either. See the duplicates. — Martijn Pieters, Feb 03 '18 at 14:19
I will edit the title to insist on no GUI context to avoid confusion with other questions. — rools, Feb 03 '18 at 14:41
Thanks for your help, unfortunately `unicodedata.east_asian_width()` does not work for zero-width characters. — rools, Feb 03 '18 at 14:42
The suggested duplicate is interesting, but does not mention curses. — Thomas Dickey, Feb 03 '18 at 14:56

Width of a string with zero-width and two-width characters, in Python 3 in a terminal (not in a GUI)

0 Answers0