JavaCC using input as a 'token'

Question

I've been puzzling over this for days and searching doesn't seem to give any results. Makes me wonder if it's possible. For example:

funct functionNAME (Object o) { o+1 };

The point is that The user has to use the identifier 'o' within the curly braces and not some other identifier. This is of course specified by the input in the (Object o) part where 'o' can be anything. Basically the identifier within the curly braces must be the same as the identifier defined in the parameter. I know I can store the matched token and print it out to screen but is it possible to use it as a lexical token itself? Thanks.

are you asking about Reflection? http://stackoverflow.com/questions/37628/what-is-reflection-and-why-is-it-useful — Aboutblank, Mar 28 '13 at 17:12
If you are writing a compiler with JavaCC, what you want to do is maintain a symbol table that keeps track of what identifiers can be used at each point in the code. Symbol tables generally also keep other useful information about identifiers, such as what they represent (e.g. variable vs function) and what their type is. — Theodore Norvell, Mar 29 '13 at 03:30
Can you clarify the question? What I get from your last sentence is: "Can I use [the matched token] as a lexical token?" But I suspect that is not what you meant. — Theodore Norvell, Mar 29 '13 at 21:38
You know how you can either define a token e.g. LETTER_A : {"a"} or define what the parser can accept through tokens e.g. | , what I want to say is that what the parser can accept is dependent on the input, meaning it is not set by me beforehand e.g. USER_INPUT : {user input}. I want to store this input and use it like that. I don't know what they are going to put other than it will be a string, it could be a single 'y' character for example. Is it clear? I know it's quite an odd question since I haven't found an answer anywhere. — Kevin Lee, Mar 29 '13 at 23:32

Theodore Norvell · Accepted Answer · 2013-04-02T11:58:00.007

Yes there is a better way to do this. You need a symbol table. The job of a symbol table is to keep track of which identifiers can be used at each point in the program. Generally the symbol table also contains other information about the identifiers, such as what they represent (e.g. variable or function name) and what their types are.

Using a symbol table you can detect the use of variables that are not in scope during parsing for many languages but not all. E.g. C and Pascal are languages where identifiers must be declared before they are used (with a few exceptions). But other languages (e.g. Java) allow identifiers to be declared after they are used and in that case it is best not to try to detect errors such as use of an undeclared variable until after the program is parsed. (Indeed in Java you need to wait until all files are parsed, as identifiers might be declared in another file.)

I'll assume a simple scenario, which is that you only need to record information about variables, that there is no type information, and that things must be declared before use. That will get you started. I haven't bothered about adding the function name to the symbol table.

Suppose a symbol table is a stack of things called frames. Each frame is a mutable set of strings. (Later you may want to change that to a mutable map from strings to some additional information.)

void Start(): { }
{
    <FUNCTION>
    <IDENTIFIER>
    {symttab.pushNewFrame() ;}
    <LBRACKET> Parameters()  <RBRACKET> 
    <LBRACE> Expression() <RBRACE>
    {symtab.popFrame() ; }
}
void Parameters() : {}
{
     ( Parameter() (<COMMA>   Parameter() )* )?
}
void Parameter() : {  Token x ; }
    <OBJECT> x=<IDENTIFIER>
    { if( symtab.topFrame().contains(x.image) ) reportError( ... ) ; }
    { symtab.topFrame().add(x.image) ; }
}
void Expression() : {  }
{
    Exp1() ( <PLUS> Exp1() )*
}
void Exp1() : { Token y ; }
{
    y = <IDENTIFIER>
    { if( ! symtab.topFrame().contains(y.image) ) reportError( ... ) ; }
|
    <NUMBER>
}

It depends on your the nature of the compiler/language processor. For a one pass compiler, you never need access to a symbol after it has gone out of scope; in this case, a stack of frames makes sense. For example in your language, at the end of each subroutine, the parameters are no longer needed and so you can pop the top frame off the symbol table stack. For a multi-pass compiler, the symbol table will be needed in future passes and so keeping the symbol table frames in a tree structure makes more sense. — Theodore Norvell, Apr 09 '13 at 10:05

score 0 · Answer 2 · answered Mar 28 '13 at 17:11

0

you can store the value of the identifier matchin o, and then check in the curly brace if the identifier there is the same, and, if not, throw an Exception.

answered Mar 28 '13 at 17:11

gefei

18,922
9
50
67

Yes I am aware that is possible but that is as far as I have gotten with it. I can't then figure out how to use it as a 'token'. For example: void showO(): {} { o.image } I know that code doesn't work but I am looking for something like that. – Kevin Lee Mar 28 '13 at 17:17

score 0 · Answer 3 · answered Mar 31 '13 at 15:56

Okay I have worked out a way to get what I want based on the example I gave in OP. It is a simple variant of the solution I have implemented in mine just to give a proof of concept. Trivial things such as token definitions will be left out for simplicity.

void Start():
{
    Token x, y;
}
{
    <FUNCTION>
    <FUNCTION_NAME>
    <LBRACKET>
    <OBJECT>
    x = <PARAMETER>
    <RBRACKET>
    <LBRACE>
    y = <PARAMETER>
    {
        if (x.image.equals(y.image) == false)
        {
            System.out.println("Identifier must be specified in the parameters.");
            System.exit(0);
        }
    }
    <PLUS>
    <DIGIT>
    <RBRACE>
    <COLON>
}

Is there a better way to do this?

JavaCC using input as a 'token'

3 Answers3