I would like to format strings to show them with curses
.
For that, I would like to add spaces if the line is too short, and cut it and add continuation characters if it is too long.
My problem is to compute the real width of the string in my terminal. In my string, I can have characters that won't be shown and others that will be of length two (and printed as square). Consequently, the function len
is not a right candidate.
To solve that, I tried to use unicodedate.category
to filter characters:
removed_category = ('Mn', 'So')
string = ''.join(char for char in string \
if unicodedata.category(char) not in removed_category)
But it does not work for all cases. For instance, 'S' is detected as a uppercase letter but in terminal its width is not one. Moreover, it is slow and not very elegant.
I also tried string.printable
to it removes also characters than can be shown in my terminal.
EDIT (since the question is closed)
Solutions based on unicodedata.east_asian_width
do not work with zero-width characters. If we combine that with unicodedata.category
, it seems to work:
def stringWidth(string):
width = 0
for c in string:
# For zero-width characters
if unicodedata.category(c)[0] in ('M', 'C'):
continue
w = unicodedata.east_asian_width(c)
if w in ('N', 'Na', 'H', 'A'):
width += 1
else:
width += 2
return width
Depending on your text, you may need to add other categores for zero-width characters.