7

I need to split a string into a list of parts in Ruby, but I need to ignore stuff inside paramentheses. For example:

A +4, B +6, C (hello, goodbye) +5, D +3

I'd like the resulting list to be:

[0]A +4
[1]B +6
[2]C (hello, goodbye) +5
[3]D +3

But I can't simply split on commas, because that would split the contents of the parentheses. Is there a way to split stuff out without pre-parsing the commas in the braces into something else?

Thanks.

Colen
  • 13,428
  • 21
  • 78
  • 107

2 Answers2

13

Try this:

s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.scan(/(?:\(.*?\)|[^,])+/)
tokens.each {|t| puts t.strip}

Output:

A +4
B +6
C (hello, goodbye) +5
D +3

A short explanation:

(?:        # open non-capturing group 1
  \(       #   match '('
  .*?      #   reluctatly match zero or more character other than line breaks
  \)       #   match ')'
  |        #   OR
  [^,]     #   match something other than a comma
)+         # close non-capturing group 1 and repeat it one or more times

Another option is to split on a comma followed by some spaces only when the first parenthesis that can be seen when looking ahead is an opening parenthesis (or no parenthesis at all: ie. the end of the string):

s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.split(/,\s*(?=[^()]*(?:\(|$))/)
tokens.each {|t| puts t}

will produce the same output, but I find the scan method cleaner.

Bob Aman
  • 32,839
  • 9
  • 71
  • 95
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • # => ["A +4", " B +6", " C (hello, goodbye) +5", " D +3"] Looks perfect to me. Might want to #trim it to remove surrounding whitespace. – Myrddin Emrys Jan 06 '10 at 20:24
  • this does not work for `A +4, B +6, C (hello, (how are you?, bad)goodbye) +5, D +3`. Any idea how to fix it please? – rochb Jan 30 '11 at 18:35
  • @rochb, when an arbitrary number of nested parenthesis come into play, use a proper parser, don't go hacking with regex. – Bart Kiers Jan 30 '11 at 19:08
6
string = "A +4, B +6, C (hello, goodbye) +5, D +3"
string.split(/ *, *(?=[^\)]*?(?:\(|$))/)
# => ["A +4", "B +6", "C (hello, goodbye) +5", "D +3"]

How this regex works:

/
   *, *        # find comma, ignoring leading and trailing spaces.
  (?=          # (Pattern in here is matched against but is not returned as part of the match.)
    [^\)]*?    #   optionally, find a sequence of zero or more characters that are not ')'
    (?:        #   <non-capturing parentheses group>
      \(       #     left paren ')'
      |        #     - OR -
      $        #     (end of string)
    )
  )
/
gabriel
  • 1,787
  • 2
  • 19
  • 24
  • That may be a bit cryptic without an explanation for the faint hearted regex-enthusiast the OP probably is! :). But a good solution nevertheless. – Bart Kiers Jan 06 '10 at 20:38
  • How does this work? I couldn't find any good documentation about how regex worked with split - like Bart K. says I'm not that great with regexes – Colen Jan 06 '10 at 20:53
  • @Colen, I posted a very similar regex as a second solution including an explanation. – Bart Kiers Jan 06 '10 at 20:54