2

I was wondering if anyone had a working C grammar for ANTLRv4 besides the one on Github?

I can't get the existing one to work at all, it won't even parse the sample files. It may be i'm missing something but I haven't had a problem with any of the other grammars.

I was thinking about modifying the existing one/writing my own, but I don't really have the time - I have limited time to work on this project.

Any help much appreciated.

thanks,

Katy

Katy
  • 39
  • 2
  • 2
    You should describe the problem you have with the sample files. Maybe the problem is not in the grammar but in the sample files or in the way you generate the parser? – quepas Dec 06 '16 at 09:26
  • You don't want to write your own or debug somebody else's C parser. C is hard language to parse in practice. See http://stackoverflow.com/a/24777596/120163 – Ira Baxter Dec 06 '16 at 09:35

2 Answers2

3

So you cannot create a working C grammar in less than a few months and it is more complex than it seems like. My opinion is that parsing all C (without preprocessor) takes 6 months to do it well.

For example, the first impression is that C grammar is context-free, but in reality it is context-sensitive.

Take the official grammar from Appendix A of the ISO Standard and start implementing sublanguages from it, inserting nonterminals one by one.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
  • You can define a context-free parser for C and then parse it. You can't do that with ANTLR or other older parsing technologies. You can hack a parser to take into accounk key context sensitivities but that's a hack. See http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737. And yes, then you have a lot more work to do to handle all the rest of the complexities of parsing C source code. – Ira Baxter Dec 07 '16 at 01:39
  • correct. the context sensitive part is named in C literature `The lexer hack.` https://en.wikipedia.org/wiki/The_lexer_hack And indeed the grammar of the standard needs some backtracking in some parts of it, but not too semnificative. Or you can generate both parse trees and eliminate that one with ambiguity at the end, etc. The idea is that C grammar is complex, not a solution for @OP: ``I don't really have the time``... – alinsoar Dec 07 '16 at 10:11
0

you can test rule translationunit instead of test rule primaryexpression