1

Python seems to have a rather surprising behavior when matching groups in Python:

>>> re.split("\+|-", "1+2")
['1', '2']

>>> re.split("(\+|-)", "1+2")
['1', '+', '2']

I haven't found any satisfying explanation for why grouping a single expression would prevent it from being matched, so what's the problem here?

According to regex101 there is absolutely no difference when it comes to matching, although more steps are required. regex test

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • 1
    See http://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators. And [`re.split` docs](https://docs.python.org/2/library/re.html#re.split): *If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.* – Wiktor Stribiżew Apr 04 '17 at 16:10

1 Answers1

2

When you add the (), you are asking Python to 'capture' that value in the split.

Like if you were to do:

>>> re.split("(a\+|-)", "1a+2")
['1', 'a+', '2']

Then it would fetch a+ and put it in the center.

What is happening it it is taking all the captured groupings and putting them in the array between relevant items, like in this example:

>>> re.split("(a)(\+|-)", "1a+2")
['1', 'a', '+', '2']
Neil
  • 14,063
  • 3
  • 30
  • 51