I'm stuck. For couple of days been trying to parse this text (look at bottom). But can't figure out some things. Firstly text is formatted in tree structure with fixed width columns but exact column width depends on widest field.
I'm using ruby, first I tried Treetop gem and made some progress, but then decided to try Parslet so I'm using it now and it seems should be easier with it, but it's hard to find detailed documentation for it.
currently I parse each line individually and create array with parsed entries, but that's not correct as I loose structure. I need parse it recursively and handle depth.
I would really appreciate any tips, ideas, suggestions.
here's my current code, it works, but all data is flattened. my current idea is to parse recursively if current line start position is bigger than previous ones (ie. width) thus it means we should go in deeper level. Actually I managed to make it such but then I couldn't get outside properly so I've removed that code.
require 'pp'
require 'parslet'
require 'parslet/convenience'
class TextParser < Parslet::Parser
@@width = 5
root :text
rule(:text) { (line >> newline).repeat }
rule(:line) { left >> ( topline | subline ).as(:entry) }
rule(:topline) {
float.as(:number) >> str('%') >> space >> somestring.as(:string1) >> space >> specialstring.as(:string2) >> space >> specialstring.as(:string3)
}
rule(:subline) {
dynamic { |source, context|
width = context.captures[:width].to_s.length
width = width-1 if context.captures[:width].to_s[-1] == '|'
if width > @@width
# should be recursive
result = ( specialline | lastline | otherline | empty )
else
result = ( specialline | lastline | otherline | empty )
end
@@width = width
result
}
}
rule(:otherline) {
somestring.as(:string1)
}
rule(:specialline) {
float.as(:number) >> str('%') >> dash >> space? >> specialstring.as(:string1)
}
rule(:lastline) {
float.as(:number) >> str('%') >> dash >> space? >> str('[...]')
}
rule(:empty) {
space?
}
rule(:left) { seperator.capture(:width) >> dash?.capture(:dash) >> space? }
rule(:somestring) { match['0-9A-Za-z\.\-'].repeat(1) }
rule(:specialstring) { match['0-9A-Za-z&()*,\.:<>_~'].repeat(1) }
rule(:space) { match('[ \t]').repeat(1) }
rule(:space?) { space.maybe }
rule(:newline) { space? >> match('[\r\n]').repeat(1) }
rule(:seperator) { space >> (str('|') >> space?).repeat }
rule(:dash) { space? >> str('-').repeat(1) }
rule(:dash?) { dash.maybe }
rule(:float) { (digits >> str('.') >> digits) }
rule(:digits) { match['0-9'].repeat(1) }
end
parser = TextParser.new
file = File.open("text.txt", "rb")
contents = file.read.to_s
file.close
pp parser.parse_with_debug(contents)
text looks like this (https://gist.github.com/davispuh/4726538)
1.23% somestring specialstring specialstring
|
--- specialstring
|
|--12.34%-- specialstring
| specialstring
| |
| |--12.34%-- specialstring
| | specialstring
| | |
| | |--12.34%-- specialstring
| | --1.12%-- [...]
| |
| --2.23%-- specialstring
| |
| |--12.34%-- specialstring
| | specialstring
| | specialstring
| | |
| | |--12.34%-- specialstring
| | | specialstring
| | | specialstring
| | --1.23%-- [...]
| |
| --1.23%-- [...]
|
--1.05%-- [...]
1.23% somestring specialstring specialstring
2.34% somestring specialstring specialstring
|
--- specialstring
specialstring
specialstring
|
|--23.34%-- specialstring
| specialstring
| specialstring
--34.56%-- [...]
|
--- specialstring
specialstring
|
|--12.34%-- specialstring
| |
| |--100.00%-- specialstring
| | specialstring
| --0.00%-- [...]
--23.34%-- [...]
thanks :)