1

I understand that to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\". Without raw string notation, one must use "\\\\".

When I saw the code string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string), I was wondering the meaning of a backslash in \' and \`, since it also works well as ' and `, like string = re.sub(r"[^A-Za-z0-9(),!?'`]", " ", string). Is there any need to add the backslash here?

I tried some examples in Python:

  1. str1 = "\'s"
    print(str1)
    str2 = "'s"
    print(str2)
    

    The result is same as 's. I think this might be the reason why in previous code, they use \'\` in string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string). I was wondering is there any difference between "\'s" and "'s" ?

  2. string = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
    re.match(r"\\", string)
    

    The re.match returns nothing, which shows there is no backslash in the string. However, I do see backslashes in it. Is that the backslash in \' actually not a backslash?

TylerH
  • 20,799
  • 66
  • 75
  • 101
Panfeng Li
  • 3,321
  • 3
  • 26
  • 34
  • 1
    Mind that you use a *raw* string for the regex. – Willem Van Onsem Jul 24 '17 at 19:19
  • 4
    Some characters are considered _special_ in some contexts of regex and for a literal match you have to escape them with a backslash. When people are not sure (read - they don't really know regex ;)) if a certain character should be escaped or not they tend to over-escape everything that seems to them as a 'special character'. Both `'` and `\`` are not special characters and don't need escaping. When you escape a non-special character it's still treated as a literal match. – zwer Jul 24 '17 at 19:23
  • @zwer Thanks, and how about the difference between `"\'s"` and `"'s"`. Are them same? – Panfeng Li Jul 24 '17 at 19:26
  • @PanfengLi - why don't you try it out: `"\'s" == "'s"`. The rule is largely the same with Python strings as well - escaping a character that needs no escaping will usually just be ignored. If you don't want it ignored (in Python context) you can force Python to treat your backslashes as literal (so it will be escaping your backslashes automatically) so `r"\'s" == r"'s"` will give you a different result. – zwer Jul 24 '17 at 19:27
  • @zwer Okay, thanks again, it returns `True`. I understand that. – Panfeng Li Jul 24 '17 at 19:30

2 Answers2

2

In python, those are escaped characters, because they can also have other meanings to the code other than as they appear on-screen (for example, a string can be made by wrapping it in a single quote). You can see all of the python string literals here, but the reason there were no backslashes found in that string is that they are considered escaped single quotes. Although it's not necessary, it is still valid syntax because it sometimes is needed

Mark R
  • 337
  • 2
  • 9
1

Check out https://docs.python.org/2.0/ref/strings.html for a better explanation.

The problem with your second example is that string isn't a raw string, so the \' is interpreted as '. If you change it to:

>>> not_raw = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res1 = re.search(r'\\',not_raw)
>>> type(res1)
<type 'NoneType'>
>>> raw = r'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res2 = re.search(r'\\',raw)
>>> type(res2)
<type '_sre.SRE_Match'>

For an explanation of re.match vs re.search: What is the difference between Python's re.search and re.match?

dashiell
  • 812
  • 4
  • 11