3

Can you please help me understand this behaviour:

>>> a = "abc\\def\\ghi"
>>> a.split(r"\\")
['abc\\def\\ghi']

However, after spending a few minutes and permutations, I found this to be working for now:

>>> a.split("\\")
['abc', 'def', 'ghi']

Can you point me to the literature/design-considerations that results in this behaviour?

timgeb
  • 76,762
  • 20
  • 123
  • 145
Darshan Pandit
  • 178
  • 2
  • 8
  • 8
    This has nothing to do with Regex. The `r` means [raw-string](http://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-in-python-and-what-are-raw-string-l), not Regex pattern. –  Jul 14 '14 at 19:06
  • Found the answer at http://stackoverflow.com/questions/2241600/python-regex-r-prefix – Darshan Pandit Jul 14 '14 at 19:10
  • To expand on that a little: `r"\\"` is two backslashes; backslashes aren't treated as an escape character in a raw string. `"\\"` is one backslash. – Tom Zych Jul 14 '14 at 19:10
  • 1
    Adding to what @iCodez said, you might often find raw strings used for regexes because the backslash is often used in a regex, and it's a pain to double them up all the time. – Mark Ransom Jul 14 '14 at 19:10
  • 1
    Thanks everybody. Stackoverflow is awesome! :) – Darshan Pandit Jul 14 '14 at 19:12

1 Answers1

5

Your string contains regular, single backslashes which have been escaped:

>>> a = "abc\\def\\ghi"
>>> a
'abc\\def\\ghi'
>>> print(a)
abc\def\ghi

When you split by "\\" you are escaping one backslash, so you are splitting by one backslash and will get a list of three elements: ['abc', 'def', 'ghi']. When you split by r"\\" you are splitting by two backslashes, because prefixing a string with r is Python's raw string notation (which has nothing to do with regexes). The important thing here is that backslashes are not handled in any special way in a raw string literal.

The reason you often see strings prefixed with r when you are looking at people's regex is that they do not want to escape backslash characters which also have a special meaning in regular expressions.

Some further reading with regards to regular expressions: The Backslash Plague

timgeb
  • 76,762
  • 20
  • 123
  • 145