I am trying to split a string without regex in a more idiomatic functional approach.
case class Parsed(blocks: Vector[String], block: String, depth: Int)
def processChar(parsed: Parsed, c: Char): Parsed = {
import parsed._
c match {
case '|' if depth == 0
=> parsed.copy(block = "", blocks = blocks :+ block ,
depth = depth)
case '[' => parsed.copy(block = block + c,
depth = depth + 1)
case ']' if depth == 1
=> parsed.copy( block = "", blocks = blocks :+ (block + c),
depth = depth - 1)
case ']' => parsed.copy(block = block + c,
depth = depth - 1)
case _ => parsed.copy(block = block + c)
}
}
val s = "Str|[ts1:tssub2|ts1:tssub2]|BLANK|[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]|[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]|BLANK |BLANK |[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]|[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]|BLANK |BLANK |[s14|s15]"
val parsed = s.foldLeft(Parsed(Vector(), "", 0))(processChar)
parsed.blocks.size //20
parsed.blocks foreach println
I would expect to get the following result (parsed.blocks.size should be 12).
Str
[ts1:tssub2|ts1:tssub2]
BLANK|
[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK
BLANK
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]
BLANK
BLANK
[s14|s15]
However result I am getting is (parsed.blocks.size is 20)
Str
[ts1:tssub2|ts1:tssub2]
BLANK
[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK
BLANK
[s2|s3|s4|INT16|INT17]
;[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]
BLANK
BLANK
[s14|s15]
To my understanding this is slight variation of parenthesis balancing problem. However in this case ;
would mean kind of continuation.
I have two questions in this case
1) How the extra entry /space after [ts1:tssub2|ts1:tssub2]
came, also after
[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]]
, [INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
and
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]
in my result as well ?
2) Here at the moment [s2|s3|s4|INT16|INT17]
and ;[s5|s6|s7|INT18|INT19]
go in as two different entries. However this should be merged as
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
a single entry[So does
[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]
and
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK])
as well]. Any clues to how to do so ?