11

Apparently the ur"" syntax has been disabled in Python 3. However, I need it! "Why?", you may ask. Well, I need the u prefix because it is a unicode string and my code needs to work on Python 2. As for the r prefix, maybe it's not essential, but the markup format I'm using requires a lot of backslashes and it would help avoid mistakes.

Here is an example that does what I want in Python 2 but is illegal in Python 3:

tamil_letter_ma = u"\u0bae"
marked_text = ur"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma

After coming across this problem, I found http://bugs.python.org/issue15096 and noticed this quote:

It's easy to overcome the limitation.

Would anyone care to offer an idea about how?

Related: What exactly do "u" and "r" string flags do in Python, and what are raw string literals?

Community
  • 1
  • 1
Jim K
  • 12,824
  • 2
  • 22
  • 51

3 Answers3

14

Why don't you just use raw string literal (r'....'), you don't need to specify u because in Python 3, strings are unicode strings.

>>> tamil_letter_ma = "\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
'\\aம\\bthe Tamil\\cletter\\dMa\\e'

To make it also work in Python 2.x, add the following Future import statement at the very beginning of your source code, so that all the string literals in the source code become unicode.

from __future__ import unicode_literals
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Interesting, but this forces _all_ string literals to become unicode strings. This may not be practical, and reverting to escaping everything so that the Python 3 version works might be the best solution. – Eric O. Lebigot May 31 '20 at 14:40
  • 1
    According to [PEP 414](https://www.python.org/dev/peps/pep-0414/), there is one **caveat** regarding Unicode escapes: `when using from __future__ import unicode_literals in Python 2, the nominally "raw" Unicode string literals will process \uXXXX and \UXXXXXXXX escape sequences, just like Python 2 strings explicitly marked with the "raw Unicode" prefix` – Marcin Wojnarski Mar 22 '21 at 22:28
3

The preferred way is to drop u'' prefix and use from __future__ import unicode_literals as @falsetru suggested. But in your specific case, you could abuse the fact that "ascii-only string" % unicode returns Unicode:

>>> tamil_letter_ma = u"\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
u'\\a\u0bae\\bthe Tamil\\cletter\\dMa\\e'
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
-2

Unicode strings are the default in Python 3.x, so using r alone will produce the same as ur in Python 2.

cdonts
  • 9,304
  • 4
  • 46
  • 72
  • 3
    -1; this misses the point of the question, which is how to write a raw unicode literal that is simultaneously valid in *both* Python 2 and Python 3. – Mark Amery Aug 05 '16 at 22:14