3

I am trying to get unicode subscripts working with string formatting... I know I can do something like this...

>>>print('Y\u2081')
Y₁
>>>print('Y\u2082')
Y₂

But what i really need is something like this since I need the subscript to iterate over a range. Obviously this doesn't work though.

>>>print('Y\u208{0}'.format(1))
  File "<ipython-input-62-99965eda0209>", line 1
    print('Y\u208{0}'.format(1))
         ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 1-5: truncated \uXXXX escape

Any help appreciated

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
asdf
  • 836
  • 1
  • 12
  • 29
  • Related: [Printing subscript in python](https://stackoverflow.com/questions/24391892/printing-subscript-in-python) – smci Sep 10 '18 at 05:46

1 Answers1

3

\uhhhh is an escape syntax in the string literal. You'd have to produce a raw string (where the escape syntax is ignored), then re-apply the normal Python parser handling of escapes:

import codecs

print(codecs.decode(r'Y\u208{0}'.format(1), 'unicode_escape'))

However, you'd be better of using the chr() function to produce the whole character:

print('Y{0}'.format(chr(0x2080 + 1)))

The chr() function takes an integer and outputs the corresponding Unicode codepoint in a string. The above defines a hexadecimal number and adds 1 to produce your desired 2080 range Unicode character.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    this won’t work for superscripts; unicode superscripts are not a continuous block – taylor swift Jul 05 '16 at 18:06
  • 1
    @Kelvin: I wasn't even thinking about superscripts here. – Martijn Pieters Jul 05 '16 at 18:08
  • Worked for me... `['Y{0}'.format(chr(0x2080 + i)) for i in range(10)] Out[74]: ['Y₀', 'Y₁', 'Y₂', 'Y₃', 'Y₄', 'Y₅', 'Y₆', 'Y₇', 'Y₈', 'Y₉']` – asdf Jul 05 '16 at 18:08
  • @Kelvin: it'll work fine for `0` through to `9` however. – Martijn Pieters Jul 05 '16 at 18:10
  • 1
    yes, it will, for subscripts. Just pointing out that most ‘special variants’ don’t work this nicely. The superscripts, for example. Neither do the mathematical italics, which for some bizarre reason, the Unicode Consortium decided to encode the `'h'` in a completely different location than the rest of the alphabet. Assuming that you can count upwards with `chr()` in a predictable fashion will produce dumb and hard-to-detect bugs. – taylor swift Jul 05 '16 at 18:12
  • 1
    @Kelvin: in which case a manual dictionary is the way out. Here, it's not needed. – Martijn Pieters Jul 05 '16 at 18:15