2

Can I somehow insert the ATN state numbers into the grammar where they occur?

I'm trying to make a tool that automatically adds all inevitable literal values into a document. For example given the following rule:

statement
    :   block
    |   'assert' expression (':' expression)? ';'
    |   'if' '(' expression ')' statement ('else' statement)?
    ;

If the user writes assert I'd add the ; or if the user enters if I'd like to add the brackets ( ).

I'm thinking that if I have the state numbers, then I can parse the grammar to find the literal values and then store them with the appropriate state number so that when user "enters" a particular state, the parser can check if there is any text that can be automatically inserted for the user.

MyiEye
  • 445
  • 1
  • 4
  • 14

2 Answers2

0

This is not possible with the given grammar as the grammar does always describe a valid input.
Therefore you will get errors when you try parsing the input in which the user hasn't yet completed the statement (e.g. he just typed assert). Of course you can then try to rely on ANTLR's error recovery system to handle that for you but I would consider that as a pretty "dirty" solution.

The alternatives you have (in my opninion) are

  1. You write a grammar that matches the respective incomplete statements and decide based on that parser whether to insert a specific character
  2. You handle the insertion process completely seperate (which I would recommed) as it has nothing to do with parsing. If you want the completions to be updated automatically whenever you are changing your grammar I'd say you need to write a program that writes the respective information from the grammar into a file which you can then use to feed into your Inserter
Raven
  • 2,951
  • 2
  • 26
  • 42
  • Thanks for taking the time to answer. Your second option is how I'm planning on storing the text to insert. The problem is choosing **when** to insert it. I think you're incorrect in saying it only parses valid input. Antlr4 is called Honey Badger, it takes whatever you give it ;). And I'd only need to parse up to where the user has typed in order to get the state number that matches that position in the grammar. – MyiEye Mar 25 '17 at 17:05
  • Hmmm well if that's the only need you want the parseTree for then it could work as ANTLR does indeed parse everything you feed it but generates error nodes in a way it thinks is suitable but that can be a little weird sometimes... – Raven Mar 25 '17 at 17:25
  • So do you happen to know how I can get the state numbers and figure out which parts of the grammar they refer to? – MyiEye Mar 25 '17 at 18:56
  • I'm afraid that I don't – Raven Mar 26 '17 at 14:12
0

Well, I played around with the API and it wasn't too difficult. Here's the code to insert all of the state numbers into a copy of a grammar file either before or after the region of the grammar that has been recognized when the state is entered. Honestly I'm not sure what it means when an interval is null. This seems to be the case for approximately a third of the states.

The code for inserting into a file is taken verbatim from xor_eq's answer.

The result of this code looks like this:

enter image description here

private static String GRAMMAR_FILE_NAME = "JavaSimple.g4";
private static String EDITED_GRAMMAR_FILE_NAME = "JavaSimple_edited.g4";

private static void insertStateNumbersIntoGrammar() throws IOException, RecognitionException {
    copyGrammarFile();

    // Load tokens
    ANTLRInputStream input = new ANTLRFileStream(GRAMMAR_FILE_NAME);
    ANTLRv4Lexer lexer = new ANTLRv4Lexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill();

    // Load Grammar
    String contents = new String(Files.readAllBytes(Paths.get(GRAMMAR_FILE_NAME)));
    Grammar g = new Grammar(contents);

    List<Insert> inserts = new ArrayList<Insert>();
    boolean before = false;
    for (ATNState state : g.atn.states) {
        int stateNr = state.stateNumber;
        Interval interval = g.getStateToGrammarRegion(stateNr);
        if (interval != null) {
            Token token = before ? tokens.get(interval.a) : tokens.get(interval.b);
            int i = before ? token.getStartIndex() : token.getStopIndex() + 1;

            String stateStr = "[" + stateNr + "]";
            long insertSize = calcInsertLengthBefore(inserts, i);
            insert(EDITED_GRAMMAR_FILE_NAME, i + insertSize, stateStr.getBytes());
            inserts.add(new Insert(i, stateStr));
        }
    }
}

private static int calcInsertLengthBefore(List<Insert> inserts, int index) {
    return inserts.stream()
            .filter(insert -> insert.index < index)
            .flatMapToInt(insert -> IntStream.of(insert.state.length()))
            .sum();
}

private static void insert(String filename, long offset, byte[] content) throws IOException {
    RandomAccessFile r = new RandomAccessFile(new File(filename), "rw");
    RandomAccessFile rtemp = new RandomAccessFile(new File(filename + "~"), "rw");
    long fileSize = r.length();
    FileChannel sourceChannel = r.getChannel();
    FileChannel targetChannel = rtemp.getChannel();
    sourceChannel.transferTo(offset, (fileSize - offset), targetChannel);
    sourceChannel.truncate(offset);
    r.seek(offset);
    r.write(content);
    long newOffset = r.getFilePointer();
    targetChannel.position(0L);
    sourceChannel.transferFrom(targetChannel, newOffset, (fileSize - offset));
    sourceChannel.close();
    targetChannel.close();
}

private static void copyGrammarFile() {
    File source = new File(GRAMMAR_FILE_NAME);
    File target = new File(EDITED_GRAMMAR_FILE_NAME);
    try {
        Files.copy(source.toPath(), target.toPath(), StandardCopyOption.REPLACE_EXISTING);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private static class Insert {
    final Integer index;
    final String state;

    Insert(int index, String state) {
        this.index = index;
        this.state = state;
    }
}
Community
  • 1
  • 1
MyiEye
  • 445
  • 1
  • 4
  • 14