-8

Edit
I'm not sure if this question is being read correctly.
I already know what string formats are in Python.
Every single little detail, I already know.
Please stop directing me to questions about string types in Python.

This is a specific question that has to do with the problem string delimiter
in the body of a raw syntax construction.

I want to know why I can't use the raw syntax r"" or r'' form on this
raw string "word's" and have it exist in a variable just like this.

It doesn't matter why I want to do this, but I've explained below.

Thanks.


I'm just going over a some syntax rules to parse and create
strings using the Raw String Syntax rules for r' ' and r" ".

For the record, I have read the docs and rules on raw strings.
The question is specific to escaping the delimiter within the raw string.

I have a utility that parses/makes other string types and is used
in production code.

I'm perplexed that Python does not remove the escape of the escaped delimiter when the string is in a variable.

Is this by design, ie. NOT removing the escape on the delimiter or what I am
hoping, just a missed part of the parse process.
Basically, a bug ?

The string is not really a raw image of the original if after parsing, it does
not look like the original.
After parsing, in a variable, it now becomes useless.

Is this an oversight and possibly something that will be corrected in the future?

As it is now, in my utility, I can only create a raw syntax form, but due to
this bug, I cannot parse it unless I take off the escape from the delimiter.

I mean, I guess I could do this as it is a direct inverse of making the string,
but it's disturbing that the lexical parser leaves this artificial escape in the variable after
the parsing process.

Here is some code I used to verify the problem:

Code

#python 2.7.12

print "Raw targt string test = \"word's\""

v1 = r' "word\'s" '     # => "word\'s" 
v2 = r" \"word's\" "    # => \"word's\"

print "using r' ' syntax, variable contains  " + v1
print "using r\" \" syntax, variable contains  " + v2

if len(v1) == len(v2) :
   print "length's are equal" 
else :
   print "length's are NOT equal" 

Output

Raw targt string test = "word's"
using r' ' syntax, variable contains   "word\'s" 
using r" " syntax, variable contains   \"word's\" 
length's are NOT equal

Either

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • `r` is an instruction _not_ to interpret any characters within a string in any special way. Also, Python interpreter does not distinguish single and double quotes as string delimiters, as long as they match pairwise. The following representations of a string a completely equivalent: `r'\''` `'\\\''` `"\\\'"` `"\\'"` `r"\'"`. Each of these representations refers to a two-element string consisting of a literal backslash and a single quote. – DYZ May 23 '17 at 01:10
  • @DYZ - It's not a duplicate, please read my question. Also, I don't care about C style strings, only raw syntax literals. If there is no way to get this raw string `"word's"` into a variable, unchanged, using raw string syntax, then if it's intended, it's useless. If not intended, it's a bug. That's all my question is. I was curious if this deleterious behavior might be cured In the future, or any other insight. That's all. –  May 23 '17 at 02:15
  • Your question, frankly speaking, is quite confusing. The only thing that `r` does is it turns off the special meaning of the backslash. If your string does not have any backslashes, adding `r` makes no difference – DYZ May 23 '17 at 02:19
  • @sln - It's not useless, it's quite useful when writing regular expressions and you don't have to escape every single backslash. That alone makes it a time saver. And it is intended to work like that - or rather, the 'inconsistent' appearance when interchanging double and single quotes is a collateral of Python syntax - if you could tell the interpreter any other way what are string boundaries you wouldn't have the issue with 'raw' strings. – zwer May 23 '17 at 02:19
  • @zwer - I don't care about regular expressions. The raw syntax is not consistent with any other language's raw syntax rules. It doesn't undo delimiter escaping when it becomes a variable. This is untenable and now undoable for my utility. –  May 23 '17 at 02:32
  • @DYZ - You don't see it. I have to introduce an _artificial_ escape for the string delimiter. That's expected. But it doesn't take it off as an _escaped delimiter_. Breaks the rules for parsing right out of the gate. Here is an example: Dot-Net raw string syntax `var str = @" ""word's"" ";` str contains `"word's"`. C++14, etc.. exactly the same rules. –  May 23 '17 at 02:39
  • Amazing to me that nobody at the time found the clear duplicate after all that discussion. – Karl Knechtel Jul 31 '22 at 04:36

2 Answers2

1

To quote the Python FAQ, raw string literals in Python were "designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing". Since the regex engine will strip the backslash in front of the quote character, Python doesn't need to strip it. This behavior will most likely never be changed since it would severely break backwards compatibility.

So yes, it is by design -- although it is quite confusing.

I want to know why I can't use the raw syntax r"" or r'' form on this raw string "word's" and have it exist in a variable just like this.

Python's raw string literals were not designed to be able to represent every possible string. In particular, the string "' cannot be represented within r"" or r''. When you use raw string literals for regex patterns, this is not a problem, since the patterns \"', "\', "', and \"\', are equivalent (that is, they all match the single string "').

However, note that you can write the string "word's" using the triple-quoted raw string literal r'''"word's"'''.

Mathias Rav
  • 2,808
  • 14
  • 24
  • How is your long quote on strings ending with a backslash related to the OP and even to your own answer? – DYZ May 23 '17 at 01:13
  • @DYZ The OP wrote "Is this by design?" and "Is this an oversight and possibly something that will be corrected in the future?". I believe that to be the core of OP's question, so that is the part I answered. The quote is an authoritative source on the **design** of raw string literals and explains why backslashes are not removed when the raw string literal is parsed. This is summarized in the first paragraph of my answer. – Mathias Rav May 23 '17 at 01:17
  • His question is not about a backslash _at the end_ of a string. This is an entirely different topic. – DYZ May 23 '17 at 01:20
  • You're right, that's not what OP asked. I've edited my answer down. I think the pointer to the Python FAQ is important as an official source on the design of r-strings. – Mathias Rav May 23 '17 at 01:30
  • Thanks for the edit and note. I have a utility that parses and makes every kind of string literal that exists. The operation is inversely equivalent. Even Dot-Net's `@""` is reversible and is correct. Like I said I'm not interested in getting `"word's"` into a string variable. I am focused on raw string syntax only. If this is how it is in python, it is _useless_ and apparently is _by design_. I'll just pass on them and go on to the next language. –  May 23 '17 at 02:22
  • I'm sorry that you're not using raw string literals for their designed purpose. Regular string literals can represent any string, and triple-quoted raw string literals can represent any string without escaping except if it contains both `'''` and `"""`. Hope this helps. – Mathias Rav May 23 '17 at 04:31
1

That's not a bug, that's intended behavior. When using r you're telling the interpreter to interpret your string, well, raw - that means turn off all escape sequences and treat the backslash as an ordinary char:

Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially.

Since the backslash is treated as a literal character, when you do r' "word\'s" ' it's equivalent to writing ' "word\\\'s" ', and since your double quoted string has different escape sequence: r" \"word's\" " it's equivalent to: ' \\"word\'s\\" ' - hence, they don't match (one more backslash, plus on different locations).

Unfortunately, since strings must be single or double quoted you must escape single quotes in a single-quoted string and double quotes in a double quoted string to avoid syntax error, but the r instruction tells the interpreter to treat all escapes literally. Besides, r was never intended for string operation anyway.

zwer
  • 24,943
  • 3
  • 48
  • 66
  • Thanks. I'm glad you said it `Besides, r was never intended for string operation anyway`. It is apparently, utterly useless. –  May 23 '17 at 02:23