8

Let's say I want to parse a string with various opening and closing brackets (I used parentheses in the title because I believe it is more common -- the question is the same nevertheless) so that I get all the higher levels separated in a list.

Given:

[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]

I want:

List("[hello:=[notting],[hill]]", "[3.4(4.56676|5.67787)]", "[the[hill[is[high]]not]]")

The way I am doing this is by counting the opening and closing brackets and adding to the list whenever I get my counter to 0. However, I have an ugly imperative code. You may assume that the original string is well formed.

My question is: what would be a nice functional approach to this problem?

Notes: I have thought of using the for...yield construct but given the use of the counters I cannot get a simple conditional (I must have conditionals just for updating the counters as well) and I do not know how I could use this construct in this case.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
PhyBandit
  • 303
  • 2
  • 7
  • See "parser combinators": http://stackoverflow.com/search?q=scala+parser+combinators – Alexander Azarov Apr 04 '12 at 14:15
  • 1
    A similar case: http://blog.tmorris.net/haskell-scala-java-7-functional-java-java/. The code in the comments is the most useful bit. – Luigi Plinge Apr 04 '12 at 14:40
  • @AlexanderAzarov, everytime I play with parser combinators I feel like I will need more experience with it to be proficient to get a solution in an almost certain time. Is it overkill here? – huynhjl Apr 04 '12 at 15:17
  • Certainly, parser combinators is quite generic tool. When you know how to use it, you'll be able to come up with a reliable solution pretty quickly and with more readable code. As usual, this is a tradeoff ("maintenance costs" vs. "performance"), as a hand-written algorithm will most likely be more performant. – Alexander Azarov Apr 05 '12 at 07:51

5 Answers5

7

Quick solution using Scala parser combinator library:

import util.parsing.combinator.RegexParsers

object Parser extends RegexParsers {
  lazy val t = "[^\\[\\]\\(\\)]+".r

  def paren: Parser[String] =
    ("(" ~ rep1(t | paren) ~ ")" |
     "[" ~ rep1(t | paren) ~ "]") ^^ {
      case o ~ l ~ c => (o :: l ::: c :: Nil) mkString ""
    }

  def all = rep(paren)

  def apply(s: String) = parseAll(all, s)
}

Checking it in REPL:

scala> Parser("[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]")
res0: Parser.ParseResult[List[String]] = [1.72] parsed: List([hello:=[notting],[hill]], [3.4(4.56676|5.67787)], [the[hill[is[high]]not]])
Alexander Azarov
  • 12,971
  • 2
  • 50
  • 54
  • It seems so easy when you do it. I've spent like a couple hours overall on this and what I had was more or less like that: `"[" ~ rep(paren ~ opt(t)) ~ "]" | "[" ~ rep(t ~ opt(paren)) ~ "]"`. – huynhjl Apr 05 '12 at 03:16
4

What about:

def split(input: String): List[String] = {
  def loop(pos: Int, ends: List[Int], xs: List[String]): List[String] =
    if (pos >= 0)
      if ((input charAt pos) == ']') loop(pos-1, pos+1 :: ends, xs)
      else if ((input charAt pos) == '[')
        if (ends.size == 1) loop(pos-1, Nil, input.substring(pos, ends.head) :: xs)
        else loop(pos-1, ends.tail, xs)
      else loop(pos-1, ends, xs)
    else xs
  loop(input.length-1, Nil, Nil)
}

scala> val s1 = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s1: String = [hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]

scala> val s2 = "[f[sad][add]dir][er][p]"
s2: String = [f[sad][add]dir][er][p]

scala> split(s1) foreach println
[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]

scala> split(s2) foreach println
[f[sad][add]dir]
[er]
[p]
kiritsuku
  • 52,967
  • 18
  • 114
  • 136
  • +1, my answer was based on the specific case, and not the generic one, which clearly requires a more complex, and apparently iterative, solution. – virtualeyes Apr 04 '12 at 14:33
2

Given your requirements counting the parenthesis seems perfectly fine. How would you do that in a functional way? You can make the state explicitly passed around.

So first we define our state which accumulates results in blocks or concatenates the next block and keeps track of the depth:

case class Parsed(blocks: Vector[String], block: String, depth: Int)

Then we write a pure function that processed that returns the next state. Hopefully, we can just carefully look at this one function and ensure it's correct.

def nextChar(parsed: Parsed, c: Char): Parsed = {
  import parsed._
  c match {
    case '[' | '(' => parsed.copy(block = block + c,
                                  depth = depth + 1)
    case ']' | ')' if depth == 1 
                   => parsed.copy(blocks = blocks :+ (block + c),
                                  block = "",
                                  depth = depth - 1)
    case ']' | ')' => parsed.copy(block = block + c,
                                  depth = depth - 1)
    case _         => parsed.copy(block = block + c)
  }
}

Then we just used a foldLeft to process the data with an initial state:

val data = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
val parsed = data.foldLeft(Parsed(Vector(), "", 0))(nextChar) 
parsed.blocks foreach println

Which returns:

[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]
huynhjl
  • 41,520
  • 14
  • 105
  • 158
2

You have an ugly imperative solution, so why not make a good-looking one? :)

This is an imperative translation of huynhjl's solution, but just posting to show that sometimes imperative is concise and perhaps easier to follow.

  def parse(s: String) = {
    var res = Vector[String]()
    var depth = 0
    var block = ""
    for (c <- s) {
      block += c
      c match {
        case '[' => depth += 1
        case ']' => depth -= 1
                    if (depth == 0) {
                      res :+= block
                      block = ""
                    }
        case _   =>
      }
    }
    res
  }
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
0

Try this:

val s = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s.split("]\\[").toList

returns:

List[String](
  [hello:=[notting],[hill],
  3.4(4.56676|5.67787),
  the[hill[is[high]]not]]
)
virtualeyes
  • 11,147
  • 6
  • 56
  • 91
  • See this example: "[f[sad][add]dir][er][p]". What you suggested clearly does not solve the general case. Besides, I am not looking for a quick fix, I have a perfectly working code. – PhyBandit Apr 04 '12 at 12:12
  • was operating on the specific case supplied. @hyunhjl's answer in particular is quite interesting... – virtualeyes Apr 04 '12 at 16:02