15

The arrangement of the characters that can be used as super-/subscript letters seems completely chaotic. Most of them are obviously not meant to be used as sup/subscr. letters, but even those which are do not hint a very reasonable ordering. In Unicode 6.0 there is now at last an alphabetically-ordered subset of the subscript letters h-t in U+2095 through U+209C, but this was obviously rather squeezed into the remaining space in the block and encompasses less than 1/3 of all letters.

Why did the consortium not just allocate enough space for at least one sup and one subscript alphabet in lower case?

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • Related post - [Where are the other letters in this Unicode block?](https://superuser.com/q/999340/374397) & [Why is there no character for "superscript q" in Unicode?](https://www.quora.com/Why-is-there-no-character-for-superscript-q-in-Unicode) – RBT Jun 22 '18 at 03:59

1 Answers1

9

The disorganization in the arrangement of these characters is because they were encoded piecemeal as scripts that used them were encoded, and as round-trip compatibility with other character sets was added. Chapter 15 of the Unicode Standard has some discussion of their origins: for example superscript digits 1 to 3 were in ISO Latin-1 while the others were encoded to support the MARC-8 bibliographic character set (see table here); and U+2071 SUPERSCRIPT LATIN SMALL LETTER I and U+207F SUPERSCRIPT LATIN SMALL LETTER N were encoded to support the Uralic Phonetic Alphabet.

The Unicode Consortium have a general policy of not encoding characters unless there's some evidence that people are using the characters to make semantic distinctions that require encoding. So characters won't be encoded just to complete the set, or to make things look neat.

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
  • 25
    So, they added a snowman with snow ☃ AND a snowman without snow ⛄, so that the weather forecaster of this world can avoid the dull snowflake ❄, but we will never get our missing superscript q‽ ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖ☃ʳˢᵗᵘᵛʷˣʸᶻ – Hugo Raguet Aug 29 '17 at 22:38
  • 1
    @HugoRaguet in fairness, it's not possible accomplish ☃ in a way like you can do `q`, so this isn't completely absurd. I suppose we should just consider all sub- and superscript code points as deprecated. Still annoying of course. – leftaroundabout Oct 02 '17 at 10:08
  • 1
    Sure, the point of Gareth Rees is perfectly clear. My comment was intended as a joke (and as the only opportunity I'll ever have to use the unicode snowman). Although I _do_ miss the `q` exponent when writing maths in text-only emails encoded in unicode... – Hugo Raguet Oct 02 '17 at 13:57
  • 9
    I am curious at what point this chase of non-existent semantics will finally end. I don't understand how is it possible that I can use some letters in subscripts but not all? Who decided that I don't actually need at least 25 characters? Was there a meeting were they considered all the pro and contra arguments? I am curious what was in the pro category? "Hey, no equations, that's good because people hate math anyway!" – Ashnur Sep 01 '18 at 02:00