3

I'm currently searching for a framework which will allow me to parse ruby code and transform the code into a concrete syntax tree.

I've taken a look at Rubyparser, which is the direction I'm interested in going, but it's giving me an abstract syntax tree instead.

Another approach would be to take apart a tool that builds a CST (maybe Pelusa or something similar).

Do you have any suggestions? It should be written in Ruby, so I can't use the original Ruby parser.

Josh Voigts
  • 4,114
  • 1
  • 18
  • 43
romedius
  • 775
  • 6
  • 20

1 Answers1

3

I'm not sure exactly what you are attempting to do, but take a look at Treetop it will let you define a grammar file and will compile the grammar to a parser in Ruby. It's a PEG parser, so it's also easier to work with than traditional LALR parsers.

Here's an example parsing a bit of Ruby (of course you will have to extend the grammar to fit your needs which may be difficult since Ruby is rather complex to parse):

require 'treetop'
Treetop.load_from_string DATA.read

parser = TestParser.new

p parser.parse('def func
   6 + 5
end')

__END__
grammar Test
   rule function
      'def' space function_name function_body 'end'
   end
   rule function_name
      [A-Za-z]+
   end
   rule function_body
      space expression space
   end
   rule expression
      '6 + 5'
   end
   rule space
      [\t \n]+
   end
end

Parsing this returns an AST:

SyntaxNode+Function0 offset=0, "...ef func\n   6 + 5\nend" (space,function_name,function_body):
  SyntaxNode offset=0, "def"
  SyntaxNode offset=3, " ":
    SyntaxNode offset=3, " "
  SyntaxNode offset=4, "func":
    SyntaxNode offset=4, "f"
    SyntaxNode offset=5, "u"
    SyntaxNode offset=6, "n"
    SyntaxNode offset=7, "c"
  SyntaxNode+FunctionBody0 offset=8, "\n   6 + 5\n" (space1,expression,space2):
    SyntaxNode offset=8, "\n   ":
      SyntaxNode offset=8, "\n"
      SyntaxNode offset=9, " "
      SyntaxNode offset=10, " "
      SyntaxNode offset=11, " "
    SyntaxNode offset=12, "6 + 5"
    SyntaxNode offset=17, "\n":
      SyntaxNode offset=17, "\n"
  SyntaxNode offset=18, "end"

Also, you can compile a treetop grammar file into Ruby code using the tt command line tool.

tt test.treetop -o test-treetop.rb
Josh Voigts
  • 4,114
  • 1
  • 18
  • 43
  • 1
    My problem is rather that I don't have the time and the energy to write a grammar + semantic analysis for ruby. Rubyparser has a good and working grammar for Ruby, so that's already the first big step. The next one is to do semantic analysis. (see http://en.wikipedia.org/wiki/Semantic_analysis_%28compilers%29#Front_end) And that's the missing part for RubyParser. I will add details to the question tomorrow. Thank you for your help. – romedius Oct 09 '12 at 15:17
  • I wouldn't be surprised if it hasn't been done before. Are you hoping to do all of Ruby (kind of like a meta-circular evaluator...)? It doesn't look like a full Treetop grammar exists anyways: [http://stackoverflow.com/a/4055743/1177119](http://stackoverflow.com/a/4055743/1177119) – Josh Voigts Oct 09 '12 at 16:05
  • i know that a few people tried to implement a grammar for different parsing frameworks, but they had to give up, or hand it over to other people after a certian time. – romedius Oct 09 '12 at 23:03