0

I just wanted to know which unicode blocks can be safely used when being limited to single-byte codepoints only.

So, which is the last single-byte codepoint, and which is the first multi-byte codepoint?

loominade
  • 1,028
  • 1
  • 11
  • 21
  • I think you're confusing "Unicode" with "UTF" (see https://stackoverflow.com/q/643694/3141234). I think your answer assumes UTF-8, which is the only unicode encoding that I know that has single-byte values. – Alexander Oct 07 '21 at 13:08
  • 1
    UTF-8 starts generating multiple bytes at U+0080, restricting you to the ASCII subset. Which is equivalent to not supporting Unicode at all. It has been done. – Hans Passant Oct 07 '21 at 13:09
  • If you used ISO-8859-1 you could use the first 256 codepoints. In most other cases, you'd be restricted to the ASCII subset, which is the first 128 codepoints. – Joachim Sauer Oct 07 '21 at 13:27

1 Answers1

1

In UTF-8, the last single-byte code point is U+007F, and first 2-byte code point is U+0080.

See https://en.wikipedia.org/wiki/UTF-8#Encoding

Alexander
  • 59,041
  • 12
  • 98
  • 151