1

I want to do something like this in Ruby

I have a text like this

    some_random_text unit 1 some_random_text chap 3 some_random_text

Now I want to extract

    some_random_text, 'unit 1', some_random_text, 'chap 3' 

For this I use an expression like this

    my_string.split(/(unit[1-9 ]+|chap[1-9 ]+)/)

I repeat the pattern [1-9 ]+ for both 'unit' and 'chap' because if I group like

   /((unit|chap)[1-9 ]+)/

It returns

    some_random_text, 'unit', 'unit 1', some_random_text, 'chap', 'chap 3' 

which has extra elements I don't need.

How do I do the grouping I need?

theReverseFlick
  • 5,894
  • 8
  • 32
  • 33

1 Answers1

1

Try this:

my_string.split(/((?:unit|chap)[1-9 ]+)/)

Your split regex contains two capturing groups, (...), which are included in the result. Using (?:...) will not capture the smaller group, and is accordingly named a non-capturing group.

Also, note that [1-9 ]+ may capture multiple spaces and numbers, but never zeros for example unit 1 2 4. You may want /((?:unit|chap) +[1-9])/, or /((?:unit|chap) +[1-9][0-9]*)/

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Works like a champ, can you please explain ?: and how this worked ? – theReverseFlick Feb 13 '11 at 07:19
  • 1
    @ShyamLovesToCode - Have a look at: http://stackoverflow.com/questions/3512471/non-capturing-group . The only thing to remember is that in Ruby `split` adds all *captured* groups to the result array. If you have two, you'll get two items in the array. You've already used this one in your regex - note that `/unit[1-9 ]+|chap[1-9 ]+/`, **with no parentheses**, will remove those tokens from your array. – Kobi Feb 13 '11 at 07:25