1

From the link, the r prefix is about raw string. Actually, there are some situations that cause me very difficult to understand the r's function. Let me express my opinion:

  1. 'H:\\Education' equals r'H:\Education', as r prefix means not convert characters. But, if tested in Python, 'H:\Education' equals 'H:\\Education'. What is the function of r here, \ equal \\ no matter there is r leading?
  2. As 'H:\Education' equals 'H:\\Education', 'H:\Education\' should equal 'H:\\Education\\' or 'H:\Education\\' or 'H:\\Education\', but actually, these four are not the same in Python; Why? Is it about the location of \ or \\?
  3. If r does nothing in list 1, why r'C:\Program Files\7-Zip\7z.exe' is right but 'C:\Program Files\7-Zip\7z.exe' is not right?
  4. r'H:\Education\' is wrong, why?

So, sometimes r has function, sometimes not. How can I tell them and make the right choice?

Y. zeng
  • 121
  • 7
  • 4
    The part you're missing is that \x doesn't always have a meaning. With r'...', backslashes are always backslashes. Without the r, if the character after the `\ ` is a set of specific characters, then the \ and that character are treated as a single character. Otherwise the \ is just a slash. `'\\h'` is the same as `'\h'` because \h doesn't mean anything. But \7 does. – Frank Yellin Aug 24 '23 at 03:30
  • You can sometimes get by with writing only a single backslash in a non-raw string literal, *IF* the following character does form a recognized escape sequence. For example, `\E` currently has no meaning, so writing `'\E'` produces the same string as `'\\E'` or `r'\E'`. However, this is dangerous to rely on, as a future version of Python might add new escape sequences, or declare that invalid escape sequences are an error. Your third example fails because `\7` *is* a recognized escape sequence - it produces the character with octal value 7. – jasonharper Aug 24 '23 at 03:30
  • @jasonharper May you check the fourth condition in the question I modified? – Y. zeng Aug 24 '23 at 03:34
  • 1
    For that specific case, see https://stackoverflow.com/questions/647769/why-cant-pythons-raw-string-literals-end-with-a-single-backslash – jasonharper Aug 24 '23 at 03:35
  • You need to separate the VALUE of a string from the REPRESENTATION of that string. The string that we write `"\rZ\n"` contains three characters, two of which are not printable. It does not contain a backslash, nor does it contain the letter "r". If you view that string in a Python command-line, it will print it as five characters. If you `print` that string, you'll see only the letter Z and a blank line. If we write that string as `r"\rZ\n"`, now the string contains 5 characters, all printable, including two backslashes and the letter "r". – Tim Roberts Aug 24 '23 at 04:09
  • Your 4th item is a syntax error, because even in a raw string, we need a way to include the quote character. The `\'` sequence embeds an apostrophe in the string. You need to add an additional apostrophe to close the string. – Tim Roberts Aug 24 '23 at 04:12

2 Answers2

2

The r prefix stops escape sequences from being interpreted. \E happens to not be an escape sequence, so '\E' == r'\E', but '\n' is a newline character, for example. You should never rely on this behaviour just to save typing an extra backslash or an r prefix; it’s very fragile.

As for a backslash at the end of a string literal: for better or for worse*, prefixes like r and f don’t affect the grammar of a string literal, just its interpretation. That means that in a code snippet like r'foo\', the \ is still serving to escape the closing quote, and you need to write r'foo' '\\' to get the desired string, for example.

>>> print(r'foo\bar' '\\')
foo\bar\

* better: parsing becomes easier, and forwards-compatible with new prefixes. worse: this issue.

Ry-
  • 218,210
  • 55
  • 464
  • 476
2

The details are in the docs at String and Bytes literals. Pay particular attention to the recognized escape sequences table a few pages down. Here are some rules

  1. \\ is converted to \
  2. \ooo (with o being octal digits) is converted to octal
  3. unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result.
  4. Even in an r literal, quotes can be escaped with a backslash, but the backslash remains in the result

In...

  • 'H:\\Education', the double backslash is replaced with a single backslash by rule 1.
  • 'H:\Education', the backslash stays in the string by rule 3.
  • 'C:\Program Files\7-Zip\7z.exe', the \7 is converted to an octal value by rule 2.
  • r'H:\Education\' the final quote is escaped (meaning it is treated as part of the literal) but that means that the literal string itself is never terminated with a closing quote which is a parser error.

It's something of a dark art. The reason why people double-escape all backslashes in Windows paths is that it is easy to miss which things don't change because of rule 3 verses the multiple things that do change. Your 7z issue is a classic case. Better to double escape everything that run the risk you missed one of the rules.

halfer
  • 19,824
  • 17
  • 99
  • 186
tdelaney
  • 73,364
  • 6
  • 83
  • 116