ANTLRWorks 1.4.3 can't properly read extended-ASCII characters

Question

I'm working on a fairly standard compiler project for which I picked ANTLR as the parser-generator. While updating an existing grammar from v2 to v3 I noticed that ANTLRWorks, the official IDE for ANTLR, wasn't displaying any of the extended-ASCII characters in the file properly. Even after using Notepad++ to convert the file to UTF8 from ASCII did it still display those characters as squares. In Notepad++ they display fine.

Since this glitch means that ANTLRWorks mauls the file when I save it I can not use it as an editor any more, which is rather annoying. Has anyone else here encountered this issue and maybe solved it? Much obliged.

[edit]: the specific issue occurs with the latest version of ANTLRWorks (downloaded it yesterday) and with the vams.g grammar file I got from http://www.antlr.org/grammar/1086696923011/vhdlams/index.html

Bart Kiers · Accepted Answer · 2011-12-04T21:31:43.790

2

I cannot reproduce this with ANTLRWorks 1.4.3.

If I create a dummy grammar:

grammar T;
parse : . ;
Any   : . ;

and paste the complete extended ASCII set in a multi-line comment:

grammar T;

/*
€

‚
ƒ

...

ÿ
*/

parse : . ;
Any   : . ;

there's no problem. It doesn't matter if I copy the chars with ANTLRWorks, or with a normal editor and then edit the existing grammar with ANTLRWorks: the characters all stay the same after saving inside ANTLRWorks.

On a related note: the versions ANTLR 3.0 to 3.3 still have some dependencies with ANTLR 2.7 classes which might cause the org.antlr.Tool to trip over certain characters outside the ASCII set. Use ANTLR 3.4 in that case, which doesn't have these old dependencies anymore.

EDIT

I suspect there's some odd byte in the original grammar somewhere that is causing all the mayhem. I quickly copied only the rules from the original grammar, changed all v2.7 syntax to v3 syntax (changed double quoted literals to single quoted ones, protected became fragment and commented some custom code) and saved it in a new file. This file could be opened (and saved) by ANTLRWorks or a plain text editor without causing it to mangle the extended ASCII chars.

Here is the ANTLR v3 version of said grammar: http://pastebin.com/zU4xcvXt (the grammar is too big to post on SO...)

EDIT II

Is the grammar name useful for anything beyond just giving it a label?

No, it's not. It's, as you mentioned, only used to give a parser or lexer a name.

There are 4 types of grammars in ANTLR:

combined grammar, which looks like grammar T;, generating TLexer.java and TParser.java source files;
parser grammar, looking like parser grammar TP;, generating a TP.java source file;
lexer grammar, looking like lexer grammar TL;, generating a TL.java source file;
tree grammar, looking like tree grammar TWalker, generating a TWalker.java source file.

edited Dec 04 '11 at 21:31

answered Dec 04 '11 at 09:16

Bart Kiers

166,582
36
299
288

1

I edited the question to add a link to the specific grammar file I'm modifying. Please try to reproduce the issue with it :) – MayaPosch Dec 04 '11 at 12:41
@MayaPosch, yes, that file reproduces what you describe. Odd... I'll have a closer look at it tomorrow (if the question hasn't been answered before that time). – Bart Kiers Dec 04 '11 at 19:13
@MayaPosch, I couldn't let it rest... Have a look at my EDIT. – Bart Kiers Dec 04 '11 at 20:03
1

Thank you, Bart :) I copied the paste to a file in Notepad++ and after saving I could open it in ANTLRWorks without issues. I'll try adding the additional sections (options, header) back in and see how it goes. Not sure if we'll ever figure out what exactly caused this issue, though... it seems to be related to how ANTLRWorks handles files. BTW: why would ANTLWorks mark VAMS as 'invalid grammar name'? Edit: Oh, seems like the grammar & filename have to match up :) – MayaPosch Dec 04 '11 at 21:00
2

@MayaPosch, you're welcome. If you named you grammar other than `VAMS.g`, ANTLRWorks will complain about its name: a combined `grammar X` (including parser _and_ lexer rules), must be named `X.g`. – Bart Kiers Dec 04 '11 at 21:06
1

Is the grammar name useful for anything beyond just giving it a label? – MayaPosch Dec 04 '11 at 21:20
@MayaPosch, see **EDIT II** . – Bart Kiers Dec 04 '11 at 21:32
1

Awesome, thank you very much :) I put the additional stuff back into the grammar file and re-enabled the custom bits without anything breaking. – MayaPosch Dec 04 '11 at 21:51
@MayaPosch, you're most welcome. Best of luck with your project :) – Bart Kiers Dec 04 '11 at 21:55

ANTLRWorks 1.4.3 can't properly read extended-ASCII characters

1 Answers1

EDIT

EDIT II

Linked