0

Looking at the spans returned from my regex matches, I noticed that they always return one past the actual match; e.g. in the example at Regular Expression HOWTO

>>> print(p.match('::: message'))
None
>>> m = p.search('::: message'); print(m)  
<_sre.SRE_Match object at 0x...>
>>> m.group()
'message'
>>> m.span()
(4, 11)

The resulting span in the example is (4, 11) vs. the actual location (4, 10). This causes some trouble for me as the left-hand and right-hand boundaries have different meanings and I need to compare the relative positions of the spans.

Is there a good reason for this or can I go ahead and modify the spans to my liking by subtracting one from the right boundary?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Toaster
  • 1,911
  • 2
  • 23
  • 43

1 Answers1

5

Because in Python, slicing and ranges never the end value is always exclusive, and '::: message'[4:11] reflects the actual matched text:

>>> '::: message'[4:11]
'message'

Thus, you can use the MatchObject.span() results to slice the matched text from the original string:

>>> import re
>>> s = '::: message'
>>> match = p.search(s)
>>> match.span()
(4, 11)
>>> s[slice(*match.span())]
'message'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I see so the designers wanted a way to slice nothing wile still providing indexes. I.e. they wanted 'message'[x:x] to return empty. – Toaster Oct 10 '14 at 10:34
  • @Colin: exactly; also see [Python's slice notation](http://stackoverflow.com/q/509211) for some helpful diagrams. – Martijn Pieters Oct 10 '14 at 10:45