2

I'm trying to split a string without removing the delimiter and having trouble doing so. The string I want to split is:

'+ {- 9 4} {+ 3 2}'

and I want to end up with

['+', '{- 9 4}', '{+ 3 2}']

yet everything I've tried hasn't worked. I was looking through this stackoverflow post for answers as well as google: Python split() without removing the delimiter

Thanks!

Community
  • 1
  • 1
BooBailey
  • 540
  • 1
  • 9
  • 31
  • Are the values between curly braces always numbers? – Jared May 16 '13 at 04:30
  • what about nested curlies? – perreal May 16 '13 at 04:32
  • 1
    Can you have nested curly braces, eg: '+ {+ 5 {- 7 2}} {+ 3 2}'? If so, what do you expect to see in your split? This looks to me like you're trying to write a prefix-notation arithmetic expression parser. – Peter DeGlopper May 16 '13 at 04:33
  • @perreal Yes, will be using nested – BooBailey May 16 '13 at 04:35
  • @PeterDeGlopper Yes, I'm writing a parser. And I will be handling nested curlies. I want to see {'+", '{+ 5 {- 7 2}} I'm trying to use recursion to handle nesting. Am I approaching this right? – BooBailey May 16 '13 at 04:41
  • 1
    You might find this discussion useful: http://stackoverflow.com/questions/5307218/prefix-notation-parsing-in-python - and, yeah, recursive parsing is the right general approach. – Peter DeGlopper May 16 '13 at 04:43
  • @PeterDeGlopper Cool. Thanks. I'm expanding beyond arithmetic to include variable, function, callback, and recursion too. I had it working for all but recursion and if statements before having trouble with one particular case on anonymous functions, so I'm rewriting using regex and ran into trouble with this case. I know a quick and dirty way, but thought there would be a better way. – BooBailey May 16 '13 at 04:52
  • 1
    If I remember my theory correctly, it's impossible to get regexps to correctly count nesting levels of things like your curly braces - you may have to parse character by character keeping track of the stack depth yourself, or use a parser generator: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – Peter DeGlopper May 16 '13 at 04:56
  • @PeterDeGlopper ah. Thanks. So switching was bad, then. Thanks so much! – BooBailey May 16 '13 at 05:03

1 Answers1

4

re.split will keep the delimiters when they are captured, i.e., enclosed in parentheses:

import re
s = '+ {- 9 4} {+ 3 2}'
p = filter(lambda x: x.strip() != '', re.split("([+{} -])", s)) 

will give you

['+', '{', '-', '9', '4', '}', '{', '+', '3', '2', '}']

which, IMO, is what you need to handle nested expressions

perreal
  • 94,503
  • 21
  • 155
  • 181
  • The **entire** regex must itself be enclosed in parentheses, so e.g. r'((\+|-)+)'. Otherwise you only capture a subgroup. – smci Jun 18 '13 at 23:20