0

I have found a text from a book that says the following:

In Python 3.X, the normal str string handles Unicode text (including ASCII, which is just a simple kind of Unicode); a distinct bytes string type represents raw byte values (including media and encoded text); and 2.X Unicode literals are supported in 3.3 and later for 2.X compatibility (they are treated the same as normal 3.X str strings).

Question : what are 2.X Unicode literals?

In Python 2.X, the normal str string handles both 8-bit character strings (including ASCII text) and raw byte values; a distinct unicode string type represents Unicode text; and 3.X bytes literals are supported in 2.6 and later for 3.X compatibility (they are treated the same as normal 2.X str strings):

Question : what are 3.X bytes literals?

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
Amare.m
  • 31
  • 4
  • 1
    A unicode literal is prefixed with an `u` like in `u'Motörhead'`. In Python 3 that's the same as `'Motörhead`'. A bytes literal has a leading `b` like in `b'na\xc3\xafve' (where `b'na\xc3\xafve'.decode() == 'naïve'`). – Matthias Jan 29 '21 at 13:20
  • @Matthias: thank you but what is the implication of the sentence: "2.X Unicode literals are supported in 3.3 and later for 2.X compatibility (they are treated the same as normal 3.X str strings)" ?? one more question! in the sentence: " In Python 2.X, the normal str string handles both 8-bit character strings (including ASCII text) and raw byte values". does it mean the characters in 8-bit representation of the Unicode text are ASCII characters? plus what is raw byte? – Amare.m Jan 29 '21 at 13:33
  • 1
    Does this answer your question? [r"string" b"string" u"string" Python 2 / 3 comparison](https://stackoverflow.com/questions/54533637/rstring-bstring-ustring-python-2-3-comparison) – JosefZ Jan 29 '21 at 14:44

1 Answers1

0

It's saying that in Python 2, strings are not Unicode by default, they are simple old-fashioned 8-bit characters ASCII/ANSI. So if you want to put a constant string in quotes in your source code (which is what literal means) and have it Python 2 interpret it as a Unicode string, not ASCII, then you have to put a "u" in front of it to declare that explicitly.

Python 2.7.18 (default, Aug  4 2020, 11:16:42)
>>> type("hello")
<type 'str'>
>>> type(u"hello")
<type 'unicode'>
>>>

In Python 3, the str class is always Unicode so you can add the "u" if you want but it doesn't make any difference; all strings are Unicode anyway. Python 3 allows the "u" syntax simply to avoid breaking older scripts that use it.

Python 3.8.5 (default, Jul 28 2020, 12:59:40)
>>> type("hello")
<class 'str'>
>>> type(u"hello")
<class 'str'>
>>>
eemz
  • 1,183
  • 6
  • 10