0

I want to split a string by colon.

This is an example of input:

str = "one[two:[three::four][five::six]]:seven:eight[nine:ten]"

This is an example of output:

array = ["one[two:[three::four][five::six]]", "seven", "eight[nine:ten]"]

The aim is to understand the regex representing the colon outside parentheses and nested parentheses.

But there are some constraints:

  • The template of regex must be like this: ^(.+)<colon_regex>(.*)<colon_regex>(.*)$
  • The match must be unique, with three groups.

Can you give me a suggestion?

Matthew Schuchard
  • 25,172
  • 3
  • 47
  • 67
BnG
  • 31
  • 6
  • So, the first constraint means `one:two:three:four` should yield no match, right? – Wiktor Stribiżew Feb 26 '17 at 23:34
  • Thanks Cary. Just edited. – BnG Feb 26 '17 at 23:41
  • Wiktor this input str = "one:two:three:four" must produce this output array = ["one", "two", "three", "four"] but with this template ^(.+)(.*)(.*)(.*)$ – BnG Feb 26 '17 at 23:47
  • Can't you just use a special check after you get all the matches? See http://ideone.com/xOPItz where that constraint is implemented with the `chunk_count`var. – Wiktor Stribiżew Feb 26 '17 at 23:58
  • No. I can't. I need of one-line regex. ... or better.... Puppet needs of one-line regex. – BnG Feb 27 '17 at 00:16
  • @BnG Well, it's *impossible*, so you're going to be disappointed. Regex has limits. See https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – user229044 Feb 27 '17 at 00:23
  • @Wiktor, I usually use /:(?=[^\]]*(?:\[|$))/ to match the colon outside parentheses or nested parentheses. I hoped to combine this regex to get the content of (.*) (iaw template), but so far without success. – BnG Feb 27 '17 at 00:35
  • I do not know how to write a regex like this for splitting logic. https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns is not relevant for Ruby regex as it supports recursion. – Wiktor Stribiżew Feb 27 '17 at 00:41

2 Answers2

2

You can use a very simple regex:

SUB_CHAR = 0.chr
  #=> "\x00"
r = /#{SUB_CHAR}/
  #=> /\x00/

to be used in s.split(r).

There is of course a catch: you must modify the string you pass to Puppet, (along with the above regex).

str = "one[two:[three::four][five::six]]:seven:eight[nine:ten]"

count = 0

idx = str.size.times.with_object([]) do |i,a|
  case str[i]
  when '[' then count += 1
  when ']' then count -= 1
  when ':' then a << i if count.zero?
  end
end
  #=> [33, 39]

s = str.dup
  #=> "one[two:[three::four][five::six]]:seven:eight[nine:ten]"
idx.each { |i| s[i] = SUB_CHAR }
s #=> "one[two:[three::four][five::six]]\u0000seven\u0000eight[nine:ten]"
s.split(r)
  #=> ["one[two:[three::four][five::six]]", "seven", "eight[nine:ten]"] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Thanks. Unfortunately I need of regex solution because I will use the ruby regex in the title_patterns method of a Puppet module (that accepts only regex). – BnG Feb 27 '17 at 00:13
  • I modified my answer to give you what I believe you need, though a bit klutzy, sometimes you do what you gotta do. – Cary Swoveland Feb 27 '17 at 01:49
  • I don't know Puppet (or Rails generally), so my assumption that both the string and the regex are passed to Puppet may be incorrect. – Cary Swoveland Feb 27 '17 at 02:38
  • Puppet is a software configuration manager that use Ruby to accomplish some features. But in certain cases it imposes constraints. – BnG Feb 27 '17 at 06:21
  • BnG, will you be passing both the string and regex to Puppet? If so, would have I have proposed work? – Cary Swoveland Feb 27 '17 at 08:45
  • Yes, I can pass the string and regex only. But I can't override the business logic. So I need of regex... unfortunately. It's too hard to resolve the problem with these constraints. I've a big headache.... – BnG Feb 27 '17 at 11:02
1

Adapting this nested parenthesis regex, you can do:

txt="one[two:[three::four][five::six]]:seven:eight[nine:ten]" 
pat=Regexp.new('((?>[^:\[]+|(\[(?>[^\[\]]+|\g<-1>)*\]))+)')
puts txt.scan(pat).map &:first
one[two:[three::four][five::six]]
seven
eight[nine:ten]
Community
  • 1
  • 1
dawg
  • 98,345
  • 23
  • 131
  • 206
  • ...ehm... Ok. It works. But I can't call map method (Limitation imposed by Puppet title_patterns method) and it parse any sting. Instead your input should be parsed by regex like ^(.+)(.*)(.*)$ – BnG Feb 27 '17 at 01:09
  • @BnG: Without recursion, it is not possible purely with a regex. – dawg Feb 27 '17 at 01:23