1

I'm trying to decode the following string:

body = '{type:paragaph|class:red|content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]}'
body << '{type:image|class:grid|content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]}'

I need the string to split at the pipes but not where a pipe is contained with square brackets, to do this I think I need to perform a lookahead as described here: How to split string by ',' unless ',' is within brackets using Regex?

My attempt(still splits at every pipe):

x = self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/ *\|(?!\]) */)}
->
[
  ["type:paragaph", "class:red", "content:[class:intro", "body:This is the introduction paragraph.][body:This is the second paragraph.]"]
  ["type:image", "class:grid", "content:[id:1", "title:image1][id:2", "title:image2][id:3", "title:image3]"]
]

Expecting:

   ->
    [
      ["type:paragaph", "class:red", "content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]"]
      ["type:image", "class:grid", "content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]"]
    ]

Does anyone know the regex required here?

Is it possible to match this regex? I can't seem to modify it correctly Regular Expression to match underscores not surrounded by brackets?


I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:

 self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}

Seems to do the trick. Though I'm sure if there's any shortfalls.

Community
  • 1
  • 1
Ryan King
  • 3,538
  • 12
  • 48
  • 72
  • Your attempt at splitting by pipe, but not a contained one, is only looking ahead by one character, so doesn't see the pipes. If you make it look ahead more, you need to also assert there is no opening bracket. You also need to assert there is no previous opening bracket. At this stage, it's worth thinking about collecting the parsed structure in a different way . . . – Neil Slater Mar 31 '13 at 08:07
  • Can brackets appear in a context other than to encapsulate bits of the input string? i.e. Is `this|[is a string]|that uses an orphan ]` value? – Kenneth K. Mar 31 '13 at 08:22
  • No they're only used in the above context. – Ryan King Mar 31 '13 at 09:41

2 Answers2

3

Dealing with nested structures that have identical syntax is going to make things difficult for you.

You could try a recursive descent parser (a quick Google turned up https://github.com/Ragmaanir/grammy - not sure if any good)

Personally, I'd go for something really hacky - some gsubs that convert your string into JSON, then parse with a JSON parser :-). That's not particularly easy either, though, but here goes:

require 'json'

b1 = body.gsub(/([^\[\|\]\:\}\{]+)/,'"\1"').gsub(':[',':[{').gsub('][','},{').gsub(']','}]').gsub('}{','},{').gsub('|',',')


JSON.parse('[' + b1 + ']')  

It wasn't easy because the string format apparently uses [foo:bar][baz:bam] to represent an array of hashes. If you have a chance to modify the serialised format to make it easier, I would take it.

Neil Slater
  • 26,512
  • 6
  • 76
  • 94
  • Could you recommend better sting format to achieve the same results? The format of the string is flexible. The goal is to have an array of hashes, the content hash is also intended to contain another array of hashes. – Ryan King Mar 31 '13 at 10:06
  • Is it possible to just use JSON? It's slightly more verbose than your format, but it can easily serialise and de-serialise the structures you need, and parsers are available in many languages already. – Neil Slater Mar 31 '13 at 10:19
  • I modified the answer here http://stackoverflow.com/questions/2015826/split-string-in-ruby-ignoring-contents-of-parentheses to get `self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}` seems to do the trick. – Ryan King Mar 31 '13 at 12:03
1

I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:

 self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}

Seems to do the trick. If it has any shortfalls please suggest something better.

Community
  • 1
  • 1
Ryan King
  • 3,538
  • 12
  • 48
  • 72
  • Provided there is no deeper nesting of `{}` or `[]` in your structure, I think it will be fine. You still aren't finished parsing the data though, unless you are able to use the result as-is. Next step I'd guess is to split each item on ':' (first one only), after that detect embedded lists and repeat your idea, but using something like `scan(/\[(.*?)\]/)` . . . – Neil Slater Mar 31 '13 at 15:29
  • That's what I've got. Thanks for your help. – Ryan King Mar 31 '13 at 21:54