22

r'\' in Python does not work as expected. Instead of returning a string with one character (a backslash) in it, it raises a SyntaxError. r"\" does the same.

This is rather cumbersome if you have a list of Windows paths like these:

paths = [ r'\bla\foo\bar',
          r'\bla\foo\bloh',
          r'\buff',
          r'\',
          # ...
        ]

Is there a good reason why this literal is not accepted?

Chris Morgan
  • 86,207
  • 24
  • 208
  • 215
Alfe
  • 56,346
  • 20
  • 107
  • 159
  • 1
    Actually, every \ as last character in such a literal raises this problem. – Alfe Apr 03 '12 at 12:37
  • 5
    Use `/` as path separator, or `os.path.sep` even in Windows; also use `os.path.split()` and `os.path.join()` as appropriate. – Burhan Khalid Apr 03 '12 at 12:38
  • The python raw string syntax is really there for doing regex, which don't usually end with \, but do often want to contain quote characters. Therefore \ is used to escape quotes. – Douglas Leeder Apr 03 '12 at 12:43
  • 3
    Just use `u'\N{REVERSE SOLIDUS}foo\N{REVERSE SOLIDUS}bar'` instead. – Josh Lee Apr 03 '12 at 12:54
  • @Josh: Very nice :D I like that. – Niklas B. Apr 03 '12 at 12:59
  • I wasn't looking for replacements, but thank you anyway. I was looking for explanation on the rationale behind this. The only provided rationale (that backslashes escape exactly one special character (the string literal quote) without being consumed in the process) does not make much sense to me. So in my eyes, the only real answer seems to be: This is a disputable decision of the Python designers. – Alfe Apr 03 '12 at 13:28
  • 2
    @Alfe: Perhaps a better way of wording it is that the Python designers considered your use case to be unlikely and not worth supporting, and preferred the simplicity of an LL(1) parser. (See [PEP 3099](http://www.python.org/dev/peps/pep-3099/) for a short comment on parser complexity.) That's why everyone else here is answering "don't do that then" -- you've found a small feature that was intended for making life easier in certain obscure edge cases, and you're complaining because it doesn't make *your* life easier. Everything in software engineering is a trade-off. – Daniel Pryden Apr 03 '12 at 16:52
  • 1
    @Daniel: Well said! Can you provide a reference that this "feature" would require a more complex parser? To me it's not immediately obvious why it would. – Niklas B. Apr 03 '12 at 19:54
  • @NiklasB.: To be more precise, it seems it would require a more complex *lexer*. My guess is that the lexer is not responsible for expanding character escapes; it simply follows the rule that a string token is not ended until it encounters a (matching) quote character that is not preceded by an odd number of backslash characters. Then the entire string is lexed as a single token, and the parser (or some other stage) handles expanding character escapes (or not, in the case of a raw string). But I don't know the code; this is just a mental model that seems to match the behavior I see. – Daniel Pryden Apr 04 '12 at 01:52
  • @Daniel: Yes, that sounds very sensible. I meant lexer, not parser :) Thanks for the additional thoughts, they seem to be consistent with this sentence from the docs: "even a raw string cannot end in an odd number of backslashes" – Niklas B. Apr 04 '12 at 01:55

5 Answers5

30

This is in accordance with the documentation:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

Use "\\" instead, or, better even, use / as path separator (yes, this works on Windows).

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    What do they mean by "would escape the following quote character"? The behaviour as I experience it seems to be that the backslash just causes the following quote not to terminate string, but is neverless included into the literal. Do you know what's the reasoning behind that? The quoted documentation doesn't seem to explain this (although it correctly defines it this way, of course). – Niklas B. Apr 03 '12 at 12:51
  • 1
    @NiklasB.: The meaning of "escape" isn't particularly well-defined. It means "bereave the following quote character of its special meaning, while keeping the backslash in the string anyway" here. I'm not sure about the rationale behind this. Might be to not confuse syntax highlighting in editors too much, or to simplify the lexer. I don't think it is useful from a user's point of view. – Sven Marnach Apr 03 '12 at 12:54
  • 1
    Thanks, that's consistent with my considerations – Niklas B. Apr 03 '12 at 13:03
  • So, it's simply a strange (and in my eyes useless) design decision? – Alfe Apr 03 '12 at 13:30
  • 4
    @Alfe: Simplifying the lexer isn't useless. And just look at the syntax coloring in your own post to see the negative effects of string literals ending in a `\ `. It was a design decision, but I wouldn't call it bad without further insight. The decision to use `\ ` as a path separator in DOS and Windows, on the other hand, *was* a bad decision, at least in hindsight. Fortunately, you can also use `/` in Python. – Sven Marnach Apr 03 '12 at 13:34
  • 1
    I can use / as a replacement only when using the file operations. In my case I'm just comparing strings :-/ Or: :-\ – Alfe Apr 03 '12 at 14:32
  • @Alfe: If you want to compare paths, you should normalise them anyway, and [`os.path.normpath()`](http://docs.python.org//library/os.path.html?highlight=os.path#os.path.normpath) will convert forward slashes to backslashes for you, so you *can* use slashes. – Sven Marnach Apr 03 '12 at 14:39
  • @sven: I'm on a Unix system receiving paths from a Windows system. I could use some fancy converter, sure, but this does not seem appropriate for the complexity of the situation. String comparison is sometimes the KISS solution. – Alfe Apr 03 '12 at 14:41
  • @Alfe: OK, comparing Windows paths on a Unix box might be the *one* situation where you should use backslashes in paths. :) – Sven Marnach Apr 03 '12 at 14:43
14

The backslash can be used to make a following quote not terminate the string:

>>> r'\''
"\\'"

So r'foo\' or r'\' are unterminated literals.

Rationale

Because you specifically asked for the reasoning behind this design decision, relevant aspects could be the following (although this is all based on speculation, of course):

  • Simplifies lexing for the Python interpreter itself (all string literals have the same semantics: A closing quote not followed by an odd number of backslashes terminates the string)
  • Simplifies lexing for syntax highlighting engines (this is a strong argument because most programming languages don't have raw strings that are still enclosed in single or double quotes and lots of syntax highlighting engines are badly broken because they use inappropriate tools like regular expressions to do the lexing)

So yes, there are probably important reasons why this way was chosen, even if you don't agree with these because you think that your specific use case is more important. It is however not, for the following reasons:

  • You can just use normal string literals and escape the backslashes or read the strings from a raw file
  • backslashes in string literals are typically needed in one of these two cases:
    • you provide the string as input to another language interpreter which uses backslashes as a quoting character, like regular expressions. In this case you won't ever need a backslash at the end of a string
    • you are using \ as a path separator, which is usually not necessary because Python supports / as a path separator on Windows and because there's os.path.sep.

Solutions

You can use '\\' or "\\" instead:

>>> print("\\")
\

Or if you're completely crazy, you can use raw string literal and combine them with normal literals just for the ending backslash or even use string slicing:

>>> r'C:\some\long\freakin\file\path''\\'
'C:\\some\\long\\freakin\\file\\path\\'
>>> r'C:\some\long\freakin\file\path\ '[:-1]
'C:\\some\\long\\freakin\\file\\path\\'

Or, in your particular case, you could just do:

paths = [ x.replace('/', '\\') for x in '''

  /bla/foo/bar
  /bla/foo/bloh
  /buff
  /

'''.strip().split()]

Which would save you some typing when adding more paths, as an additional bonus.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • I wasn't looking for replacements, but thank you anyway. I was looking for explanation on the rationale behind this. – Alfe Apr 03 '12 at 13:24
  • @Alfe: I was looking for a rationale behind this, but after doing that, I don't think there's much more to it other than simplifying parsing and maybe making it easier for syntax highlighters to get it right. So the short answer would be: No, there's no really good reason why this is not allowed. Did you downvote? If yes, why so? – Niklas B. Apr 03 '12 at 13:25
  • 1
    My question wasn't for replacements, that's why I downvoted. No offense, but my question wasn't answered by this. – Alfe Apr 03 '12 at 14:34
  • @Alfe: We can't tell you more than there is to it. Your question isn't very clear, as you can easily achieve what you want without using raw strings. What's your actual question? The Python designers decided it this way, the arguments provided here are rather strong, what more do you expect? – Niklas B. Apr 03 '12 at 14:35
  • By the way: You can't ask a "yes/no" question and expect more than a yes/no answer! – Niklas B. Apr 03 '12 at 14:38
  • I wanted to know what I wrote: "Is there a good reason why this literal is not accepted?" (I don't see what is unclear about it.) Currently the only reason seems to be a design decision concerning the lexer, and maybe some insight into the syntax highlighting mechanism of IDEs (but I'm pretty sure this is not a real problem). Actually I think the answer to my question is simply "no". – Alfe Apr 03 '12 at 14:39
  • 1
    @Alfe: Those design decisions are much more important than your "use case". Why don't you put those strings into a file or just escape the backslashes? It's not that this would present a problem. Please see my edit. – Niklas B. Apr 03 '12 at 14:52
2

That is because in raw strings, you need a way to escape single quotes when the string is delimited by single quotes. Same with double quotes.

http://docs.python.org/reference/lexical_analysis.html#string-literals

Steef
  • 33,059
  • 4
  • 45
  • 36
0

The answer to my question ("Why is a backslash not allowed as last character in raw strings?") actually to me seems to be "That's a design decision", furthermore a questionable one.

Some answers tried to reason that the lexer and some syntax highlighters are simpler this way. I don't agree (and I have some background on writing parsers and compiler as well as IDE development). It would be simpler to define raw strings with the semantics that a backslash has no special meaning whatsoever. Both lexer and IDE would benefit from this simplification.

The current situation also is a wart: In case I want a quote in a raw string, I cannot use this anyway. I only can use it if I happen to want a backslash followed by a quote inside my raw string.

I would propose to change this, but I also see the problem of breaking existing code :-/

Duncan Jones
  • 67,400
  • 29
  • 193
  • 254
Alfe
  • 56,346
  • 20
  • 107
  • 159
0

To address your root problem, you can use / in paths on Windows in Python just fine.

The r'' and r"" syntax ( raw ) is primarily for working with regular expressions. It doesn't really get you anything in the case of working with paths like you are expecting, especially where the string ends with a \.

Otherwise if you insist on using \ either use '\\' or "\\", you have to escape the escape character which is \; it isn't pretty, using / or os.path.sep is the best solution.

  • the damn editor is escaping my escapes, and not showing what I actually typed in! I didn't type in what @NiklasB. is seeing. –  Apr 03 '12 at 12:45
  • These matters are one very bad area of Markdown: incompatible implementations. I've fixed it with using double backtick as the delimiter rather than single backtick. – Chris Morgan Apr 03 '12 at 12:47
  • 1
    @NiklasB.: `r'\\'` produces a double backslash, `'\\\\'` – Chris Morgan Apr 03 '12 at 12:49
  • @Chris Morgan: I was just quoting the answer here (note the *not*) – Niklas B. Apr 03 '12 at 12:50
  • My root problem is the impossibility to use \ as last character in raw strings. The Windows paths was just an example (in which, in fact, a string I get from a text file is such a path with (sometimes) trailing backslash and which I want to compare to my string which I have to denote somehow and which then of course should not replace the backslashes with slashes ;-). – Alfe Apr 03 '12 at 13:36