38

I ran across something once upon a time and wondered if it was a Python "bug" or at least a misfeature. I'm curious if anyone knows of any justifications for this behavior. I thought of it just now reading "Code Like a Pythonista," which has been enjoyable so far. I'm only familiar with the 2.x line of Python.

Raw strings are strings that are prefixed with an r. This is great because I can use backslashes in regular expressions and I don't need to double everything everywhere. It's also handy for writing throwaway scripts on Windows, so I can use backslashes there also. (I know I can also use forward slashes, but throwaway scripts often contain content cut&pasted from elsewhere in Windows.)

So great! Unless, of course, you really want your string to end with a backslash. There's no way to do that in a 'raw' string.

In [9]: r'\n'
Out[9]: '\\n'

In [10]: r'abc\n'
Out[10]: 'abc\\n'

In [11]: r'abc\'
------------------------------------------------
   File "<ipython console>", line 1
     r'abc\'
           ^
SyntaxError: EOL while scanning string literal


In [12]: r'abc\\'
Out[12]: 'abc\\\\'

So one backslash before the closing quote is an error, but two backslashes gives you two backslashes! Certainly I'm not the only one that is bothered by this?

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa. If I wanted both, I'd just triple quote. If I really wanted three quotes in a row in a raw string, well, I guess I'd have to deal, but is this considered "proper behavior"?

This is particularly problematic with folder names in Windows, where the backslash is the path delimeter.

drevicko
  • 14,382
  • 15
  • 75
  • 97
dash-tom-bang
  • 17,383
  • 5
  • 46
  • 62

4 Answers4

24

It's a FAQ.

And in response to "you really want your string to end with a backslash. There's no way to do that in a 'raw' string.": the FAQ shows how to workaround it.

>>> r'ab\c' '\\' == 'ab\\c\\'
True
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • 8
    Certainly seems like a misfeature. – DS. May 20 '10 at 03:04
  • 3
    @DS: Your suggested alternative design for raw strings is ...? – John Machin May 20 '10 at 03:54
  • 3
    Didn't know it was a FAQ, but probably should have assumed as much. ;) Not speaking for @DS, but my alternative design is "no escape processing." You know, kinda like what it says on the tin? – dash-tom-bang May 20 '10 at 15:36
  • looks like the location of this faq [moved to a new location](http://docs.python.org/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash). I think i could edit your answer if I had enough rep, but I do not. – oob Dec 09 '10 at 05:46
  • 1
    Seems like Python parses raw strings like regular strings, then "un-does" the escapes? Very poor behavior. That said, I think this could be "fixed" without breaking any existing code. – GLRoman Mar 26 '20 at 02:46
4

Raw strings are meant mostly for readably writing the patterns for regular expressions, which never need a trailing backslash; it's an accident that they may come in handy for Windows (where you could use forward slashes in most cases anyway -- the Microsoft C library which underlies Python accepts either form!). It's not cosidered acceptable to make it (nearly) impossible to write a regular expression pattern containing both single and double quotes, just to reinforce the accident in question.

("Nearly" because triple-quoting would almost alway help... but it could be a little bit of a pain sometimes).

So, yes, raw strings were designed to behave that way (forbidding odd numbers of trailing backslashes), and it is considered perfectly "proper behavior" for them to respect the design decisions Guido made when he invented them;-).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Yes- I addressed the reason why I'm using back slashes in my OP. Thanks though; my point was exactly that triple quoting would get over any problem with using quote characters in regular expressions. Indeed, I've wanted to have a trailing backslash but never a regexp with several different types of quote characters. – dash-tom-bang May 20 '10 at 15:35
  • 1
    This continues to boggle my mind why this is a thing. The stated reason that "it's the only way to have both single and double quotes in the string" falls flat because you always need to have a backslash before this necessary quote mark and that backslash persists in the compiled string. There is no way that I can see to create a string containing only single and double quotes short of triple-quoting. – dash-tom-bang Mar 02 '16 at 20:15
  • I wish I could upvote this more. I think the behavior is oddly inconsistent, but this answer gives some hints as to WHY the behavior is oddly inconsistent. – Dave C Jan 23 '18 at 22:11
  • 1
    Wait, raw string processing is built-in while regexps must be imported -- I'm not buying this. Python fails here and a fix would be most welcome. – GLRoman Mar 26 '20 at 02:48
3

Another way to workaround this is:

 >>> print(r"Raw \with\ trailing backslash\ "[:-1])
 Raw \with\ trailing backslash\

Updated for Python 3 and removed unnecessary slash at the end which implied an escape.

Note that personally I doubt I would use the above. I guess maybe if it was a huge string with more than just a path. For the above I'd prefer non-raw and double up the slashes.

GravityWell
  • 1,547
  • 1
  • 18
  • 22
  • 2
    Oh joy, a "raw" string where we're escaping the escape character -- which is why most folks want raw strings in the first place! Python is FUBAR here. – GLRoman Mar 26 '20 at 02:50
-1

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa.

But that would then raise the question as to why raw strings are 'raw, except for embedded quotes?'

You have to have some escape mechanism, otherwise you can never use the outer quote characters inside the string at all. And then you need an escape mechanism for the escape mechanism.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • The rule "you can't use the surrounding quote character in the string" seems an easy one to follow and very pragmatic. In the exceptionally rare case that you need all four of single quote, double quote, tripled single quote, and tripled double quote, I think it be not too out of line to say that those cannot all appear in one continuous raw string. When I want a raw string, I don't want escapes, so it seems stupid to have one escape at one location in a raw string that then causes an error. – dash-tom-bang Dec 08 '11 at 19:37
  • @dash-tom-bang That rule prevents you from using that character *at all.* Any rule that doesn't have that restriction is better than any rule that does. – user207421 Dec 13 '11 at 09:37
  • If the alternative is that you can't do something else that you might want to do (e.g. "have a trailing backslash") then the answer is not so black and white. "It's a raw string except..." violates the desire to "do the obvious thing"; exceptions to rules should be avoided when possible. – dash-tom-bang Jan 03 '12 at 00:59
  • @dash-tom-bang Your point eludes me. You haven't addressed the issue. There has to be a way to represent every character in a quoted string. Without an escape mechanism you can't represent the quotes. – user207421 Jan 07 '12 at 09:14
  • Say you want to represent a newline in a raw string? `r'\n'` gives you a string with two characters. If you leave the 'n' out you get an error. If you add another backslash you again have two characters, backslash-backslash. My point is that the error is an inconsistency for the sake of "there's no reason to ever do this, I have a complete understanding of any application that will ever be written." (To get a newline you need to create a multiline raw string.) – dash-tom-bang Jan 12 '12 at 02:22
  • @dash-tom-bang I can't make any sense out of that either. – user207421 Feb 04 '12 at 09:05
  • ok, fine. More simply, how do *you* use the "escape character" in raw strings to represent, well, anything? Please test your answers in the interpreter. I think you will find that the term "escape character" means nothing in a raw string. So again I ask, why does this "escape" character (really not an escape at all but rather a backslash) mean something special when in index -1? – dash-tom-bang Mar 20 '12 at 00:47
  • @dash-tom-bang You can use it before a single quote in a string terminated by single quotes, as in your own example. This is hardly something that requires further empirical verification. It doesn't even merit further discussion actually. – user207421 Mar 20 '12 at 01:00
  • You can use it before a single quote but the resulting string is a backslash followed by a single quote. If you want a single quote without a backslash right before it you're out of luck unless you use double quotes to surround your string. If you want both single and double quotes in your string, you're left needing to triple quote, which is also what I said before. I am even more annoyed now by this "facility" than before, which interprets the backslash as an escape but the backslash is kept in the remaining string. This appears to contradict your understanding of the "feature." – dash-tom-bang Mar 26 '12 at 19:28