Why unicode literal is needed if string is unicode in Python 3?

Question

In my understanding, string in Python 3 is UTF-8, so we should be able to use any unicode code points except private area ones. Then why Unicode literal with u"..." is still needed?

The PEP 414 has below but not sure what it means.

Python 2 supports a concept of "raw" Unicode literals that don't meet the conventional definition of a raw string: \uXXXX and \UXXXXXXXX escape sequences are still processed by the compiler and converted to the appropriate Unicode code points when creating the associated Unicode objects.

Python 3 has no corresponding concept - the compiler performs no preprocessing of the contents of raw string literals. This matches the behaviour of 8-bit raw string literals in Python 2.

So that you can write code that runs in Python 2 _and_ Python 3 (it was reintroduced in [Python 3.3](https://docs.python.org/3/whatsnew/3.3.html)). — jonrsharpe, Jan 04 '21 at 10:49
To quote the PEP, *"...substantially increasing the number of lines of existing Python 2 code in Unicode aware applications that will run without modification on Python 3."* — jonrsharpe, Jan 04 '21 at 10:54
The PEP already has substantial reasoning; can you please clarify what *more* you need? — MisterMiyagi, Jan 04 '21 at 11:02
@MisterMiyagi, thanks for the follow up as always. Hope you can provide something valuable. — mon, Jan 04 '21 at 13:14
@mon I might if it were clear what you are looking for. The PEP linked in the question mentions the general reason (backwards compatibility) for including ``u"..."`` at least twice before the cited section. Since the section cited addresses a more specific topic (raw unicode aka ``ur"..."``) it's not really clear to me whether you are asking about the general topic reason and just missed it, or are asking specific topic reason glossed over in the PEP. — MisterMiyagi, Jan 04 '21 at 14:50
@MisterMiyagi, someone already provided the answer. Against PEP20, "There should be one-- and preferably only one --obvious way to do it", PEP 414 has re-introduced additionally way which is redundant. Annotating unicode as "this is Unicode". Seems it does not raise any question to you? — mon, Jan 05 '21 at 00:59
@mon Indeed the feature in isolation does. Yet the very PEP cited in the question gives the reason in its very first sentence, plus an extensive rationale, so with the material presented in the question it should already answer itself; it seems redundant just like a Unicode annotation. Seems that does not raise any question to you? — MisterMiyagi, Jan 05 '21 at 07:13

score 2 · Accepted Answer · answered Jan 04 '21 at 10:56

2

It is redundant in Python 3, but permitted in order to facilitate partial portability to Python 2 (e.g. for scripts written to be cross-compatible using six or similar helpers).

answered Jan 04 '21 at 10:56

tripleee

175,061
34
275
318

Why unicode literal is needed if string is unicode in Python 3?

1 Answers1