Raw unicode literal that is valid in Python 2 and Python 3?

Question

Apparently the ur"" syntax has been disabled in Python 3. However, I need it! "Why?", you may ask. Well, I need the u prefix because it is a unicode string and my code needs to work on Python 2. As for the r prefix, maybe it's not essential, but the markup format I'm using requires a lot of backslashes and it would help avoid mistakes.

Here is an example that does what I want in Python 2 but is illegal in Python 3:

tamil_letter_ma = u"\u0bae"
marked_text = ur"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma

After coming across this problem, I found http://bugs.python.org/issue15096 and noticed this quote:

It's easy to overcome the limitation.

Would anyone care to offer an idea about how?

_{Related: What exactly do "u" and "r" string flags do in Python, and what are raw string literals?}

score 14 · Accepted Answer · edited Apr 11 '19 at 13:03

14

Why don't you just use raw string literal (r'....'), you don't need to specify u because in Python 3, strings are unicode strings.

>>> tamil_letter_ma = "\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
'\\aம\\bthe Tamil\\cletter\\dMa\\e'

To make it also work in Python 2.x, add the following Future import statement at the very beginning of your source code, so that all the string literals in the source code become unicode.

from __future__ import unicode_literals

edited Apr 11 '19 at 13:03

Mark Amery

143,130
81
406
459

answered Oct 08 '15 at 23:02

falsetru

357,413
63
732
636

Interesting, but this forces _all_ string literals to become unicode strings. This may not be practical, and reverting to escaping everything so that the Python 3 version works might be the best solution. – Eric O. Lebigot May 31 '20 at 14:40
1

According to [PEP 414](https://www.python.org/dev/peps/pep-0414/), there is one **caveat** regarding Unicode escapes: `when using from __future__ import unicode_literals in Python 2, the nominally "raw" Unicode string literals will process \uXXXX and \UXXXXXXXX escape sequences, just like Python 2 strings explicitly marked with the "raw Unicode" prefix` – Marcin Wojnarski Mar 22 '21 at 22:28

score 3 · Answer 2 · edited May 23 '17 at 12:32

3

The preferred way is to drop u'' prefix and use from __future__ import unicode_literals as @falsetru suggested. But in your specific case, you could abuse the fact that "ascii-only string" % unicode returns Unicode:

>>> tamil_letter_ma = u"\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
u'\\a\u0bae\\bthe Tamil\\cletter\\dMa\\e'

edited May 23 '17 at 12:32

Community

1
1

answered Oct 09 '15 at 09:40

jfs

399,953
195
994
1,670

score -2 · Answer 3 · answered Oct 08 '15 at 23:02

-2

Unicode strings are the default in Python 3.x, so using r alone will produce the same as ur in Python 2.

answered Oct 08 '15 at 23:02

cdonts

9,304
4
46
72

3

-1; this misses the point of the question, which is how to write a raw unicode literal that is simultaneously valid in *both* Python 2 and Python 3. – Mark Amery Aug 05 '16 at 22:14

Raw unicode literal that is valid in Python 2 and Python 3?

3 Answers3