2

As a new development to my previous question (ANTLRWorks 1.4.3 can't properly read extended-ASCII characters), I created a simple text file using a hex editor:

' ' '£' '°' 'ç'

Or in hex:

27 A0 27 20 27 A3 27 20 27 B0 27 20 27 E7 27

The resulting file reads fine in Notepad++. Upon opening in ANTLRWorks 1.4.3 the (extended) ASCII characters are displayed as square boxes. Upon saving the file after adding and removing a space at the end of the line, the hexadecimal file view looks as follows:

27 3F 20 27 A3 27 20 27 B0 27 20 27 3F

For some reason the initial space (20) in between apostrophes got mutilated into a question mark (3F) and the special c with cedilla character (E7) and the apostrophe following it got both replaced by a question mark.

It seems that the presence of extended ASCII characters somehow results in things going horribly wrong. Can anyone here replicate this issue and/or offer a possible reason and solution?

Thanks in advance.

Community
  • 1
  • 1
MayaPosch
  • 315
  • 8
  • 20

1 Answers1

2

You could just use the Unicode escapes instead. Say you want to match the English pound sign, you'd do:

PoundSign : '\u00A3';

instead of:

PoundSign : '£';

They (should) both match the same character, and the first may very well not be mangled.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 1
    I guess that works... it doesn't mangle anything this way at least, but it is rather annoying to have to use this approach when there is seemingly no reason why it has to be done like this :( – MayaPosch Dec 11 '11 at 18:15
  • @MayaPosch, yeah, I agree, it's a hassle, but I guess it's either that, or don't use ANTLRWorks at all. Raising a bug (which it most probably is) won't do much good I think since the upcoming release of ANTLR (version 4), ANTLRWorks will be completely rewritten and integrated with ANTLR. – Bart Kiers Dec 12 '11 at 06:32
  • 1
    Ah, I see. Any idea when that version will be out? :) – MayaPosch Dec 12 '11 at 12:19
  • 1
    @MayaPosch, I haven't seen a date come by on the ANTLR mailing-lists I read. I expect a beta release to come out before v4 will be officially released (in other words: it may take a while, at least, not within a few months, is my guess). – Bart Kiers Dec 12 '11 at 12:28
  • Alright, Unicode codepoints it is, then. Thanks for offering this alternative :) – MayaPosch Dec 12 '11 at 13:16