2

I'm looking for a way to edit .java files programmatically without resorting to unstable RegExes. Since I'm editing .java files, not .class files I do not want any Byte-code manipulation tools.

I need something that:

  • Is IDE-independent (hence no ASTParser. I want to automate it on CI, so AST is out unless there's a standalone version)
  • Allows me to read a .java file, add an annotation to a method and save it - hence pure source code generation tools (CodeModel comes to mind) are not enough
  • Is not too complicated and/or dedicated for Java - hence no ANTLR

So in short, something to reproduce this scenario:

File f = new File("path/to/.java");
CodeParser p = CodeParser.parse(f);
Method m = p.getMethods.get(0);
if (m.getBody().contains("abcdef") 
     && m.getAnnotation.getClass().equals(Test.class)){
   m.addAnnotation(MyAnnotation.class);
}
p.saveEdits(f);

I have tried Java reflection, but it can't do it (also since it's byte-code analysis, it can't parse a method's body). Similarly with java model API. I tried to get AST to work standalone but I failed (maybe there is a way?)

If there is absolutely no way or tool to do it, is it possible to do with regexes in a unique and stable way? (i.e. no possible Java sourcecode would be an input for operation other than in above pseudo-code). If so, please give me an example of such.

Also, I do not need to compile it, after pushing the changes, CI will do it for me.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Stye
  • 110
  • 1
  • 9
  • Agreed, @ElliottFrisch's designation of this question as a duplicate is outright wrong. First, the questions are different; OP did not ask anything about Maven. Second, the answers are different; OP wants to know something about how he can reliably *manipulate* the AST, which the other question does not address. – Ira Baxter Jun 12 '14 at 14:03
  • 1
    Modifying source code reliably simply isn't possible with regexes. As a fundamental flaw, they can't handle the nesting structure of real code; as a practical flaw, you want to match the code *structure*, not the code *text*. What you need is a program transformation system (PTS): a tool that can parse text to produce ASTs, the ability to inspect and change the AST, and a final capability to prettyprint the AST back to source code. See http://en.wikipedia.org/wiki/Program_transformation for a list of robust PTS. Finally, don't expect your work to be trivial even with a PTS. – Ira Baxter Jun 12 '14 at 14:10
  • ... the good news is that with a robust PTS, you *can* make relaible changes to source code. I'd add a longer answer with an example, if SO will remove the "duplicate" designation on this question. No way to get it into a comment. In the meantime, check my bio. – Ira Baxter Jun 12 '14 at 14:12
  • 1
    AFAIK the ASTParser is not dependent of the GUI part of Eclipse. It can be used stand-alone as shwon here: http://www.programcreek.com/2011/01/a-complete-standalone-example-of-astparser/ – Robert Jun 12 '14 at 14:16
  • both your comments are interesting. @Robert I will give this a try but how can I input an existing .java file into AST without Document class? Your example seems to generate a file from scratch. Ira Baxter - Thanks for defining the domain of my problem. Some googling has lead me to this page: http://www.program-transformation.org/Transform/JavaParserGenerators . I will tinker in those as well, although a lot of them seem to be outdated. – Stye Jun 12 '14 at 14:24
  • Java parser generators won't help you much. First, these are mostly parser generators implemented in Java (which you may want to use because you like Java, but isn't specifically part of your problem), but as parser generators all they will really let you do is define a grammar and give you a little bit (not a lot) of help in building an AST. They won't help you at *all* in inspecting/modifying or prettyprinting the tree. If you goal is to modify Java code, using these just means you'll get stuck rebuilding a bunch of infrastructure, and not actually modifying Java code. You *want* a PTS. – Ira Baxter Jun 12 '14 at 14:42
  • See my essay on Life After Parsing. http://www.semanticdesigns.com/Products/DMS/LifeAfterParsing.html – Ira Baxter Jun 12 '14 at 14:45
  • I've used [javaparser](https://github.com/matozoid/javaparser) before. I wrote an article about what I did here: http://ismail.badawi.io/blog/2013/05/03/writing-a-code-coverage-tool/ – Ismail Badawi Jun 12 '14 at 18:02
  • @IsmailBadawi: For comparison, see a paper on using a PTS to implement test coverage, that I wrote back in 2002: http://www.semanticdesigns.com/Company/Publications/TestCoverage.pdf – Ira Baxter Jun 12 '14 at 21:09

2 Answers2

1

You can do this reliably with a program transformation system (PTS). These are IDE-independent.

One of these is our DMS Software Reengineering Toolkit. OP can accomplish his specific task with code something like the following DMS meta-program: (not tested and doesn't handle all the edge cases):

 (= parse_Tree  (Domains:Java:Parser:ParseFile (. "path/to/.java")))
 (local (= [method_tree AST:Node] (AST:ScanTree parse_Tree (Registry:Pattern (. `any_method'))
      (ifthen (&& (~= method_tree AST:NullTree)
                  (Registry:PatternMatch method_tree (. `TestClass'))
                  (~= AST:NullTree (AST:ScanTree method_tree 
                                       (Registry:Pattern (. `abcdef_identifier'))))
          (Registry:ApplyTransform method_tree (. `insert_MyAnnotation'))
      )ifthen
 )local
 (Registry:PrettyPrintToFile method_tree (. "path/to/.java"))

DMS's metaprogramming language looks like Lisp, with prefix operators. (Get over it :-) ParseFile reads a source file and builds an AST, parked in parse_Tree. ScanTree scans tree looking for a point where the supplied predicate ("Registry:Pattern (. `any_method'") is true, and returns a matching subtree or null. Registry:PatternMatch checks that a pattern predicate is true at the root of the specified tree. Registry:ApplyTransform applies a source-to-source transformation to modify the tree.

This metaprogram is supported by a set of named patterns, which make it easy to express tests/transforms on tree without knowing every last detail of the tree structure. These are oversimplified for presentation purposes:

 default domain Java~v7;

 pattern any_method(p: path_to_name, name: method_name, args: arguments,
                    b: body, a: annotations):declaration =
    " \p \name(\args) \a \b ";  -- doesn't handle non-functions but easily adjusted

 pattern TestClass(p: path_to_name, name: method_name, args: arguments,
                    b: body, a: annotations):declaration =
    " \p \name(\args) [Test.class] \b ";

 pattern abcdef_identifier():IDENTIFIER =
      "abcdef";

 rule insert_MyAnnotation(p: path_to_name, name: method_name, args: arguments,
                          b: body, a: annotations):declaration =
    " \p \name(\args) \a \b "
    ->
    " \p \name(\args) \a [myAnnotation] \b ";

The quote marks are metaquotes; they delineate the boundaries between the syntax of the pattern matching language as a whole, and code fragments written in the target language (in this case, Java, because of the domain declaration). Inside the meta quotes is target (Java) language syntax, with escaped identifiers representing pattern variables that correspond to specific tree node types. You have to know the rough structure of the grammar to write these, but notice we didn't really dive into details of how annotations or anything is formed.

Arguably the "any_method" and "TestClass" patterns could be folded into one (in fact, just the TestClass pattern itself, since it is pure specialization of "any_method".

The last rule (the others are patterns, only meant for matching) says, "if you see X, replace it by Y". What the specific rule does is pattern match to the method with some list of annotations, and add another one.

This is the way to reliable program transformations. If you don't want to use DMS (a commercial product), check out the Wikipedia page for alternatives.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Thank you so much :) Don't worry, I can handle Lisp-like languages and RPN (being Polish myself, huh). I can't test your solution (because it's commercial). Your solution is pretty similar in what I would expect of implementing an ANTLR grammar and working from there. After a day's worth of research I can see how complex the problem is. I'm marking your answer as a solution, because it does extensively answer my question, better than any other answer to a similar question to mine ever has in my opinion. – Stye Jun 12 '14 at 16:21
  • As it's too much work, however for just that specific case (it actually is the only thing I need to do to the source code), I'm going to decide to use regex (to Jamie Zawinski's horror) for that specific thing (less hassle). Parsing out @Test methods, finding method's main curly brackets and then parsing whats inside, together with providing a regex for annotations above it, suddenly sounds more doable after realizing how complex this gets once u want a real and clean solution. You gave me (and, I believe, future inquirers) a really interesting read. Thanks a lot for the effort. – Stye Jun 12 '14 at 16:26
  • P.S. Reading through all that made me recall Futamura projections and all the goodies u can do with syntactic analysis. – Stye Jun 12 '14 at 16:29
  • If you stick to *pure* ASTs, our solution might look similar from the perspective of "this only fiddles with ASTs". I think you underrate the value of the source-language style pattern matching/tree rewriting. If you are only going to do this example, maybe so; what invariably happens is your ambitions grow and then all that procedural code you write to climb up and down the ASTs and check node types gets really out of hand. ... – Ira Baxter Jun 12 '14 at 16:59
  • ... Secondly, if you do anything that needs deeper semantic properties ("what is the type of this symbol? Who does this method call? Where did this value come from?) you discover why ASTs are simply, simply NOT ENOUGH. Aho's book on compilers doesn't end at Chapter 3 on ASTs for the same reason: you need a lot more infrastructure support "beyond parsing", and you really don't want to build it all yourself, for the same reason you don't want to build a steel factory just because you need a spoon. (see my comment on LifeBeyondParsing). – Ira Baxter Jun 12 '14 at 17:02
  • FWIW: tree-to-tree rewrites on just syntax are equivalent to Post systems, thus Turing capable. You can do *anything* with them. You can do anything with a tape-style Turing machine too, but nobody does; it isn't expressive enough. ASTs are very useful, but they aren't the answer to serious program analyis and transformation by themselves. – Ira Baxter Jun 12 '14 at 17:03
  • ... If you *can* make regex work for your very special case, then that's what I would do, too. In almost all cases where people try to do this with real code, it fails spectacularly when it comes to reliability. PTSs can be reliably applied to millions of lines of code (we have done a lot of this with DMS; its how we make a living). – Ira Baxter Jun 12 '14 at 17:09
  • I do need just a spoon, so no factory needs to be built : ). Pretty-printing shocked me though. It sounds as though those kind of programs completely discard the input source, extract what data is needed and then recreate it with that knowledge. I would have thought some sort of pointers would be allocated to each syntactic structure (methods, fields etc.) while building the tree in the (non-discarded) source, keeping all the whitespaces and whatnot intact. The more I know :) – Stye Jun 12 '14 at 17:32
  • Yes, those tools *do* discard the source. The problem is that any synthesized ASTs don't have source; how are you going to prettyprint them? And if you can do that for synthesized tree nodes, you can do it for un-changed tree nodes. See my discussion on prettyprinting: http://stackoverflow.com/a/5834775/120163 – Ira Baxter Jun 12 '14 at 17:39
  • Thanks. That's the original problem too. I need those changes to be SCM-friendly. I'll stick with regex-backed specialiser :) – Stye Jun 12 '14 at 17:41
  • If you insist on preserving the original whitespace complete with mixed blanks and tabs (is that what you mean by "SCM friendly"?), then yes, something will need to pick that whitespace up. As a practical matter, you can do this a number of ways: store the whitespace on tree nodes, refetch the whitespace from the original file based on recorded line numbers in nodes, compare original file to regenerated file and retain lines that differ only in whitespace. As a practical matter, we have not found this to be a big problem. – Ira Baxter Jun 14 '14 at 16:11
-1

Look up javax.lang.model and the annotation processing API, javax.annotation.processing. This lets you write plugins to the javac compiler in a standardized way, all compilers supports it. You can find tutorials and talks that highlight this online.

There are a couple of limitations, for example I don't think you can rewrite the source of a file, but you can generate a new file (or class) with the annotation present. Also you can't model code inside method bodies.

  • If it can't model the code content of a method body, how will this answer OP's query m.getBody().contains("abcdef")? – Ira Baxter Jun 12 '14 at 15:26