I need to parse Bash in Java and generate an AST. This is do some analysis on imported shell scripts to check for potential issues. The analysis is a bit more involved than can be done with regex.
Research I have already done:
- ANTLR is often great for this, but there isn't an open-source grammar for bash or shell
- The official bash grammar (parse.y) is in yacc which is heavily tied to C
- There are some yacc-like parsers for Java, e.g. JavaCC. However, converting the bash yacc looks like a big job.
- There is a BNF grammar for bash but it's only bash 2.0 and misses many features. There are several BNF parsers for Java, e.g. bullwinkle
At this point I'm pretty stuck. Some ideas I had:
- Update the BNF grammar to support newer bash features
- Find some semi-automated way to convert the yacc grammar to a format that can be used with Java
- Run the yacc parser as native code and interface with JNI
Any further suggestions gratefully received!