In Python 3, Unicode strings are supposed to kindly give you the number of Unicode characters, but I can't figure out how to get the final display width of a string given that some characters combine.
Genesis 1:1 -- בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ
>>> len('בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ')
60
But the string is only 37 characters wide. Normalization doesn't solve the problem because the vowels (dots underneath the larger characters) are distinct characters.
>>> len(unicodedata.normalize('NFC', 'בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ'))
60
As a side note: the textwrap
module is totally broken in this regard, aggressively wrapping where it shouldn't. str.format
seems similarly broken.
- Similar question that was marked as a duplicate: Display width of unicode strings in Python
- The question it was marked as a duplicate of only addresses normalization: Normalizing Unicode