3

I am working on a project where I'd like to develop some static source code analysis tools. The source code will be in multiple proprietary languages that interact with one another. So, I am looking for a project that defines an abstract Model/AST and can do some data flow analysis for languages where I can translate each proprietary language into the Model and be able to analyze the data flow/tree.

Does such a project exist?

tophersmith116
  • 432
  • 1
  • 5
  • 19

3 Answers3

2

Not open source, but designed and proven useful for building tools to handle multiple, complex langauges: our DMS Software Reengineering Toolkit.

DMS contains strong parsing machinery (capable of handling difficult languages such as C++) that builds ASTs automatically from just a grammar description, and libraries to support construction of symbol tables, and various kinds of control and data flow analysis.

OP will have to provide grammar and semantical descriptions of his proprietary languages, but I think he is expecting that. If he wants to model flows across the languages, he'll have to organize his flow analyses for the individual languages to be compatible. The fact that DMS uses uniform infrastructure/datastructures to support all these activities even for different langauges will make this easier.

He should not expect a project involving multiple languages to be easy or quick, regardless of the framework he finds. Our intention with DMS was to make this practical.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

I think the Object Management Group's (OMG) Specification for the Knowledge Discovery Metamodel (KDM) is kind of in the space you're looking for. (See http://www.omg.org/spec/KDM/). It's part of the Architecture Driven Modernization (ADM) activity at the OMG. KDM has been republished by ISO as ISO/IEC 19506:2012(E).

From the introduction:

This International Standard defines a meta-model for representing existing software assets, their associations, and operational environments, referred to as the Knowledge Discovery Meta-model (KDM).

You'll likely have to do most of the heavy lifting yourself, but at least the metamodel has been provided.

Erick G. Hagstrom
  • 4,873
  • 1
  • 24
  • 38
  • I note this is just a standard. If OP want to make any progress with this standard, he will have to find a tool that supports it. It isn't obvious there is an open source version of such a tool. Then he'll have to encode his languages into the framework. – Ira Baxter Feb 13 '16 at 10:50
0

More as a sidemark: If you are not too much interested in syntactic details and have the free choice of your platform, you might as well analyze code for a VM, like .Net bytecode. There are compilers for C# and F# and also C++(/CLI) and Visual Basic (of course most of them from a well-known, large software company :-) ) They all compile to bytecode programs, which can be inspected e.g. by tools like Mono.Cecil, which allow to construct control flow graphs etc.

Pachelbel
  • 523
  • 3
  • 12
  • This presupposes the OP's set of languages already translate to a well-defined VM or even the JVM or CLI. That may be the case, but if not, it converts one problem into another: how does he get VM code, before he starts? – Ira Baxter Feb 16 '16 at 17:55
  • That's true, there are lots of preconditions. However, if they are met, he can obtain static analysis information regardless of the input language. An example for such a static analysis tool is "CodeContracts" or "Clousot" by MS Research.Another idea might be LLVM, but I think their IR does not have a "unified" ABI, i.e. things like method calls might be modelled differently, depending on the language. – Pachelbel Feb 16 '16 at 22:31