0

I have a source code of new kind of small programming language;

 method M(n: int) returns (r: int)
  ensures r == n;
{
  var i := 0;
  while (i < n)
  {
    i := i + 1;
  }
  r := i;
}

I want to read this source file of this code (just one file without any dependencies) using Java and create XML for function name, input parameters,return types,keyword ensures etc.

In order to do that, I need to analyse given source code maybe create a kind of a tree structure to see hierarchical view. (at least I am thinking that way)

Is there any kind of framework that could help me to customize the keywords in order to analyse this kind of material and generate XML out of it or should I just read this file line by line and try to create XML parser by myself.

My main purpose in here to represent this code in XML format.In order to generate some UML kind diagrams.I am not aiming to create new compiler or language. (my question was not clear enough I hope this makes it more clear)

Can
  • 369
  • 4
  • 16
  • 1
    Why are you trying to generate XML? I mean, don't you want to do some processing on your language parse tree before generating some XML from it? – James Oct 25 '13 at 20:18
  • 3
    And if generating an XML file is that important, well you should really provide a sample that match the sample code you gave above. – James Oct 25 '13 at 20:21
  • I want to visualize this language that's why I tough generating XML would help me. – Can Oct 25 '13 at 20:48
  • Visualize this language? What do you mean by that? I understand from some older post that this is the Dafny language. And you previously referred to BlueJ, which seems to have a database visualizer. That is some thing that I know about. But visualizing the above language, I'm not certain. Do you want to display the structure of the code? Or are you talking about simple syntax highlighting? – James Oct 25 '13 at 20:55
  • Let's consider the code above I will draw a square like class-UML diagram and represent input types for this method inside of the square etc. Imagine I will try to represent this method as UML kind of diagram. – Can Oct 25 '13 at 20:59
  • Ok, now things are becoming clearer. Then again, why XML? Why not use a graph library directly from the parse tree? Or generate a dot file, and let graphviz layout and output the graphic as an image? – James Oct 25 '13 at 21:03
  • But even though you might now want to generate a compiler, you will still need a complete parser for the language. That is, unless the only thing that actually matters is the first line of the sample file ("method M(n: int) returns (r: int)"). This one alone could be dealt with regexes. But otherwise, you will need to account for the language structure. – James Oct 25 '13 at 21:07
  • Again I am sorry for the confusion just tree kind of structure also works I should say XML or tree. The important thing here is I need to categorize return types, input parameters etc somehow. – Can Oct 25 '13 at 21:08

5 Answers5

1

Your description is a bit vague, but it sounds like you're looking for a library for parsing a custom language and converting into another language. You might start with ANTLR. Also, if you are building Java objects from your input, you might consider JAX-B for marshalling to XML.

bedwyr
  • 5,774
  • 4
  • 31
  • 49
  • Thanks a million for your answer bebwyr. The purpose in here to generate a visual graphical representation from given custom source code but far as I know ANTLR is used to create a compiler for custom languages is not it like Coco/R ? – Can Oct 25 '13 at 20:53
  • ANTLR is a library which can lex and process languages. How it's used is up to you. I worked on a project where it was used to convert an XPath query-like language into SQL. From its main site: "ANTLR...is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. " – bedwyr Oct 25 '13 at 21:09
1

You can use for that the parser generator ANTLR. The process is to define the language as a grammar consisting of rules. ANTLR uses a EBNF form for that. If the parser can derive a rule, you can specify an action in Java what to do, in your case to write some XML tags to a stream.

Vertex
  • 2,682
  • 3
  • 29
  • 43
1

Before you can think about generating an XML file, the first part in doing what you discussed would definitely be to parse the input document. Now, regexes are not a good candidate for that job. And hand made parsers are difficult to conceive, especially for languages that support some form of operator predeceence.

Here are three good libraries to develop parsers for whatever language you may design. They are not all equivalent, though, so choosing either of them should be guided by the kind of language you are designing.

Using any of these, you will describe your language structure and keywords, then code to be run when each element is found. You will then add code to create a parse tree (or you may let the engine generate one for you). Then, you may write code to work upon that parse tree, and possibly, a visitor to output it to XML.

By the way, if the exact structure of your language is still undefined, then you may actually use any of the previous "parser generator" tool. In that case, if you are an actual user of Eclipse, then I might suggest that you try XText first, as it will generate an Eclipse editor, with autocompletion support, refactoring support, etc. All for free.

Update: XText can also be used to generate a graphic editor for your language, provided that it make sense. Have a look here for an example: http://vimeo.com/12824804.

James
  • 4,211
  • 1
  • 18
  • 34
1

This is not a trivial subject (if you want to do it right). You are going to need to do most of the stages of writing a compiler (minus the actual writing out machine code part).

See this thread for lots of info to get started: Learning to write a compiler

Making a compiler is a really rewarding experience, but it is a lot of work.

Once you create a parse tree, you'll be able to export it to XML. But that part will come a lot later.

Community
  • 1
  • 1
Chill
  • 1,093
  • 6
  • 13
0

Assuming that ONLY the header line of each method is important, here is a completely different strategy.

read a line from your input file
    if (line match regex /^ \s* method ([a-zA-Z][a-zA-Z0-9_]*)\(([^)]*)\) returns \(([^)]*)\) /x )
        // So the line is a method header. Extract arguments
        currentMethodName = group(1);
        currentArguments = group(2);
        currentReturnType = group(3);

        methods.add(new MethodDefinition(...));
    end if


for (method : methods) {
    // Generate XML for that method...
}

Is this approach more suited to your expectations and needs?

James
  • 4,211
  • 1
  • 18
  • 34
  • Thank you jwatkinds for code snipped. There is actually more of this language that I need to visualize but your point is "use regular expressions and parse the code file according your requirements using java standard libraries". (I just want to make it clear for other people also) – Can Oct 25 '13 at 21:25
  • I still think the best approach would be to build a complete parser, even if you do not use it as a compiler afterward. But what I'm saying here is that except from parser generators, there are basically two other strategies to parsing files. Regexes might work if you don't indeed to understand potentially recursive structures. Hand made parsers seems easy to create at first, but generally ends up being buggy and almost impossible to maintain, unless you actually understand very well parser generation theories. So yes, here is a snippet for the regex approach, should you want to go that way. – James Oct 25 '13 at 21:37