6

i am very new to regular expression and trying get "\" character using python

normally i can escape "\" like this

print ("\\");
print ("i am \\nit");

output

\
i am \nit

but when i use the same in regX it didn't work as i thought

print (re.findall(r'\\',"i am \\nit"));

and return me output

['\\']

can someone please explain why

nitesh sharma
  • 591
  • 2
  • 6
  • 16
  • 16
    It's not good practice to use semicolons in python. – jamylak Apr 27 '12 at 11:05
  • Same issue, question and answer as [http://stackoverflow.com/questions/647769/why-cant-pythons-raw-string-literals-end-with-a-single-backslash][1] [1]: http://stackoverflow.com/questions/647769/why-cant-pythons-raw-string-literals-end-with-a-single-backslash – Zeugma Apr 27 '12 at 11:44

4 Answers4

17

EDIT: The problem is actually how print works with lists & strings. It prints the representation of the string, not the string itself, the representation of a string containing just a backslash is '\\'. So findall is actually finding the single backslash correctly, but print isn't printing it as you'd expect. Try:

>>> print(re.findall(r'\\',"i am \\nit")[0])
\

(The following is my original answer, it can be ignored (it's entirely irrelevant), I'd misinterpreted the question initially. But it seems to have been upvoted a bit, so I'll leave it here.)

The r prefix on a string means the string is in "raw" mode, that is, \ are not treated as special characters (it doesn't have anything to do with "regex").

However, r'\' doesn't work, as you can't end a raw string with a backslash, it's stated in the docs:

Even in a raw string, string quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character).

But you actually can use a non-raw string to get a single backslash: "\\".

huon
  • 94,605
  • 21
  • 231
  • 225
  • In Python 2.7, `re.compilte("\\")` gives error `error: bogus escape (end of line)`. – Limbo Peng Apr 27 '12 at 11:13
  • To match an actual backslash via regex, you need two backslashes in the regex, then again two backslashes to escape those: `re.compile("\\\\")` produces a regex that matches a single backslash. – Tim Pietzcker Apr 27 '12 at 11:13
  • @LimboPeng, I'd read the question wrong so my original answer was incorrect. – huon Apr 27 '12 at 11:17
  • @dbaupp Oops! I was about to write the same answer - sad :( – Limbo Peng Apr 27 '12 at 11:20
  • Python parses strings in two passes: first it figures out the beginning quote marker and looks to figure out where the end quote marker is - this treats backslashes next to quotes the same way for raw strings; it must, because otherwise you couldn't embed the quote mark in a raw string. In the second pass, the stuff between the quotes is interpreted. – Karl Knechtel Apr 27 '12 at 12:05
1

can someone please explain why

Because re.findall found one match, and the match text consisted of a backslash. It gave you a list with one element, which is a string, which has one character, which is a backslash.

That is written ['\\'] because '\\' is how you write "a string with one backslash" - just like you had to do when you wrote the example code print "\\".

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
0

Note that you're using two different kinds of string literal here -- there's the regular string "a string" and the raw string r"a raw string". Regular string literals observe backslash escaping, so to actually put a backslash in the string, you need to escape it too. Raw string literals treat backslashes like any other character, so you're more limited in which characters you can actually put in the string (no specials that need an escape code) but it's easier to enter things like regular expressions, because you don't need to double up backslashes if you need to add a backslash to have meaning inside the string, not just when creating the string.

Andrew Aylett
  • 39,182
  • 5
  • 68
  • 95
-2

It is unnecessary to escape backslashes in raw strings, unless the backslash immediately precedes the closing quote.

Marcin
  • 48,559
  • 18
  • 128
  • 201