6

I want to split a string by comma:

"a,s".split ','  # => ['a', 's']

I don't want to split a sub-string if it is wrapped by parenthesis:

"a,s(d,f),g,h"

should yield:

['a', 's(d,f)', 'g', 'h']

Any suggestion?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Naveed
  • 11,057
  • 2
  • 44
  • 63

2 Answers2

12

To deal with nested parenthesis, you can use:

txt = "a,s(d,f(4,5)),g,h"
pattern = Regexp.new('((?:[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)')
puts txt.scan(pattern).map &:first

pattern details:

(                        # first capturing group
    (?:                  # open a non capturing group
        [^,(]+           # all characters except , and (
      |                  # or
        (                # open the second capturing group
           \(            # (
            (?>          # open an atomic group
                [^()]+   # all characters except parenthesis
              |          # OR
                \g<-1>   # the last capturing group (you can also write \g<2>)
            )*           # close the atomic group
            \)           # )
        )                # close the second capturing group
    )+                   # close the non-capturing group and repeat it
)                        # close the first capturing group

The second capturing group describe the nested parenthesis that can contain characters that are not parenthesis or the capturing group itself. It's a recursive pattern.

Inside the pattern, you can refer to a capture group with his number (\g<2> for the second capturing group) or with his relative position (\g<-1> the first on the left from the current position in the pattern) (or with his name if you use named capturing groups)

Notice: You can allow single parenthesis if you add |[()] before the end of the non-capturing group. Then a,b(,c will give you ['a', 'b(', 'c']

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • `txt.scan(pattern).map &:first` solved the problem. @casimir really I can't double up-vote your answer, thanks! – Naveed Aug 25 '13 at 00:50
  • 1
    I was curious about -1 in regexp. can you please also explain this part `<-1>` – Naveed Aug 25 '13 at 01:04
3

Assuming that parentheses are not nested:

"a,s(d,f),g,h"
.scan(/(?:\([^()]*\)|[^,])+/)
# => ["a", "s(d,f)", "g", "h"]
sawa
  • 165,429
  • 45
  • 277
  • 381
  • thanks for edits & awesome answer! one issue though, this will not cater nested parentheses `"a,s(d,f(4,5)),g,h".scan /(?:\([^()]*\)|[^,])+/` => `["a", "s(d", "f(4,5))", "g", "h"]` – Naveed Aug 25 '13 at 00:26
  • "a,s(d,f(4,5)),g,h" should break in ['a', 's(d,f(4,5)', 'g','h'] – Naveed Aug 25 '13 at 00:45
  • 3
    It is not an issue. That is expected, as I wrote in the answer. The fact is that your example didn't fully specify nested cases, and I assumed what you didn't wanted. – sawa Aug 25 '13 at 00:48