4

By default, the #split method work as follows:

"id,name,title(first_name,last_name)".split(",")

will give you following output:

["id", "name", "title(first_name", "last_name)"]

But I want something like following:

["id", "name", "title(first_name,last_name)"]

So, I use following regex (from the this answer) using split to get desired output:

"id,name,title(first_name,last_name)".split(/,(?![^(]*\))/)

But, again when I use another string, which is my actual input above, the logic fails. My actual string is:

"id,name,title(first_name,last_name,address(street,pincode(id,code)))"

and it is giving following output:

["id", "name", "title(first_name", "last_name", "address(street", "pincode(id,code)))"]

rather than

["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
nemesv
  • 138,284
  • 16
  • 416
  • 359
fidato
  • 719
  • 5
  • 22
  • @nemesv Thanks. Is there any way I can achieve the same in ruby? Any idea? – fidato Jan 01 '18 at 13:15
  • 1
    Spring#Split support regex so you just need to take the pattern from the linked question: `"id,name,t­itle(first­_name,last­_name)".sp­lit(/,(?![­^(]*\))/)` – nemesv Jan 01 '18 at 13:31
  • @nemesv above regex is failing with following string input: "id,name,title(first_name,last_name,address(street,pincode(id,code)))" Do you have any idea how to resolve it? – fidato Jan 01 '18 at 15:20
  • 1
    @fidato Don't use a regex. You're trying to parse a nested data structure, not split a string by a delimiter. Regex is the wrong tool for this job. – user229044 Jan 01 '18 at 16:58
  • @meagar Is there any other way I do the same thing? or I need to create my own logic for the same! – fidato Jan 01 '18 at 17:04
  • @fidato You could write a simple parser for this in just a few lines of code. You just need to keep track of opening `(` and closing `)`. – user229044 Jan 01 '18 at 17:13
  • 1
    Possible duplicate of [Regex to match only commas not in parentheses?](https://stackoverflow.com/questions/9030036/regex-to-match-only-commas-not-in-parentheses) – Toto Jan 01 '18 at 18:04
  • 1
    @Toto We've established this is not a duplicate of that question. It has already been closed as a duplicate of it and reopened, and it contains the solution provided there and specifically shows the case for which it doesn't work. – user229044 Jan 01 '18 at 18:15

3 Answers3

3

Updated Answer

Since the earlier answer didn't take care of all the cases as rightly pointed out in the comments, I'm updating the answer with another solution.

This approach separates the valid commas using a separator | and, later uses it to split the string using String#split.

class TokenArrayParser
  SPLIT_CHAR = '|'.freeze

  def initialize(str)
    @str = str
  end

  def parse
    separate_on_valid_comma.split(SPLIT_CHAR)
  end

  private

  def separate_on_valid_comma
    dup = @str.dup
    paren_count = 0
    dup.length.times do |idx|
      case dup[idx]
      when '(' then  paren_count += 1
      when ')' then paren_count -= 1
      when ',' then dup[idx] = SPLIT_CHAR if paren_count.zero?
      end
    end

    dup
  end
end

%w(
  id,name,title(first_name,last_name)
  id,name,title(first_name,last_name,address(street,pincode(id,code)))
  first_name,last_name,address(street,pincode(id,code)),city(name)
  a,b(c(d),e,f)
  id,name,title(first_name,last_name),pub(name,address)
).each {|str| puts TokenArrayParser.new(str).parse.inspect }

# =>
# ["id", "name", "title(first_name,last_name)"]
# ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
# ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"]
# ["a", "b(c(d),e,f)"]
# ["id", "name", "title(first_name,last_name)", "pub(name,address)"]

I'm sure this can be optimized more.

hallucinations
  • 3,424
  • 2
  • 16
  • 23
  • Thanks for the answer, but it doesn't giving me my desired result either. I want following o/p: ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"] – fidato Jan 01 '18 at 18:36
  • Your code gives me array of 5 strings whereas I need only 3 strings that is `id`, `name` and `title(first_name,last_name,address(street,pincode(id,code))‌​` – fidato Jan 01 '18 at 18:41
  • 1
    @fidato, sorry about that. Edited. Could you check now? – hallucinations Jan 01 '18 at 18:42
  • @fidato, my pleasure! – hallucinations Jan 01 '18 at 18:46
  • one more modification: can you modify above logic bit to convert "first_name,last_name,address(street,pincode(id,code)),city(name)" into ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"] – fidato Jan 01 '18 at 18:53
  • This fails with input like `a,b(c(d),e,f)`. It returns `["a", "b(c(d)", "e", "f)"]` instead of `["a", "b(c(d),e,f)"]`. – user229044 Jan 01 '18 at 19:00
  • 1
    This is a really ill-advised implementation depending on a [feature of Ruby](http://code.extension.ws/post/201717938/the-ruby-conditional-range) that is best avoided. It requires every closing parenthesis to be encountered in the same token. – user229044 Jan 01 '18 at 19:15
  • I can delete my answer based on comments. @fidato, please reject this answer. – hallucinations Jan 01 '18 at 19:17
  • Another limitation: `split_unless_parenthesized("id,name,title(first_name,last_name),pub(name,address)") #=> ["id", "name", "title(first_name,last_name),pub(name,address)"]` rather than `["id", "name", "title(first_name,last_name)" ,"pub(name,address)"]`. – Cary Swoveland Jan 01 '18 at 22:14
  • 1
    Updated my answer with another approach that I believe takes care of all these cases mentioned in the problem and comments. – hallucinations Jan 02 '18 at 00:46
3
def doit(str)
  split_here = 0.chr
  stack = 0
  s = str.gsub(/./) do |c|
    ret = c
    case c
    when '('
      stack += 1
    when ','
      ret = split_here, if stack.zero?
    when ')'
      raise(RuntimeError, "parens are unbalanced") if stack.zero?
      stack -= 1
    end
    ret
  end
  raise(RuntimeError, "parens are unbalanced, stack at end=#{stack}") if stack > 0
  s.split(split_here)
end

doit "id,name,title(first_name,last_name)"
  #=> ["id", "name", "title(first_name,last_name)"]
doit "id,name,title(first_name,last_name,address(street,pincode(id,code)))"
  #=> ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
doit "a,b(c(d),e,f)"
  #=> ["a", "b(c(d),e,f)"]
doit "id,name,title(first_name,last_name),pub(name,address)"
  #=> ["id", "name", "title(first_name,last_name)", "pub(name,address​)"]
doit "a,b(c)d),e,f)"
  #=> RuntimeError: parens are unbalanced
doit "a,b(c(d),e),f("
  #=> RuntimeError: parens are unbalanced, stack at end=["("]

A comma is to be split upon if and only if stack is zero when it is encountered. If it is to be split upon it is changed to a character (split_here) that is not in the string. (I used 0.chr). The string is then split on split_here.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • My answer originally had `stack` defined as an array. I pushed `'{'` onto `stack when `'('` was encountered and popped one `'{'` when `'}'` was encountered. After reviewing @hallucinations' revised answer I realized it was only the size of `stack` I was using, so I simplified my answer to make `stack` a local variable. – Cary Swoveland Jan 02 '18 at 01:48
  • I learned something else from @hallucination's revised answer which caused me to make a further edit. I had replaced commas that are not to be split upon with a special character, split on the remaining commas and then replaced the special characters with commas. It makes more sense to replace the commas to be split upon with the special character, which is what hallucations has done, avoiding the need for my last step. – Cary Swoveland Jan 02 '18 at 06:38
-1

This could be one approach:

"id,name,title(first_name,last_name)".split(",")[0..1] << "id,name,title(first_name,last_name)".split(",")[-2..-1].join

Creating a duplicate string and splitting them both, then combining the first two elements of the first string with the joined last two elements of the second string copy. At least in this specific scenario it would give you the desired result.

L.E.H
  • 19
  • 2