3

Given an expression with operators, functions, and operands, such as:

2 + sin ( max ( 2, 3 ) / 3 * 3.1415 )

How can I programmatically validate the expression, such that any functions must have the correct number of parameters? For example abs,sin,cos must have exactly 1 parameter, whereas sum,avg,max,min have 2 or more.

Given that each parameter can itself be a very complicated expression, it seems non-trivial to programmatically determine this. I have already written a lexical tokenizer (lexer), and I've managed to convert the expression to postfix/RPN. (Which is: 2 3 max 3 / 3.1415 * sin 2 +). I am still no closer to a solution.

I would appreciate some code or pseudocode that will guide me in writing something from scratch. Java would be great.

Below is my lexer code:

    public static List<Token> shunt(List<Token> tokens) throws Exception {
    List<Token> rpn = new ArrayList<Token>();
    Iterator<Token> it = tokens.iterator();
    Stack<Token> stack = new Stack<Token>();
    while (it.hasNext()) {
        Token token = it.next();
        if (Type.NUMBER.equals(token.type))
            rpn.add(token);
        if (Type.FUNCTION.equals(token.type) || Type.LPAREN.equals(token.type)) 
            stack.push(token);
        if (Type.COMMA.equals(token.type)) {
            while (!stack.isEmpty() && !Type.LPAREN.equals(stack.peek().type))
                rpn.add(stack.pop());
            if (stack.isEmpty()) 
                throw new Exception("Missing left parenthesis!");
        }
        if (Type.OPERATOR.equals(token.type)) {
            while (!stack.isEmpty() && Type.OPERATOR.equals(stack.peek().type))
                rpn.add(stack.pop());
            stack.add(token);
        }
        if (Type.RPAREN.equals(token.type)) {
            while (!stack.isEmpty() && !Type.LPAREN.equals(stack.peek().type))
                rpn.add(stack.pop());
            if (stack.isEmpty()) 
                throw new Exception("Missing left parenthesis!");
            stack.pop();
            if (!stack.isEmpty() && Type.FUNCTION.equals(stack.peek().type))
                rpn.add(stack.pop());
        }
    }
    while (!stack.isEmpty()) {
        if (Type.LPAREN.equals(stack.peek().type) || Type.RPAREN.equals(stack.peek().type))
            throw new Exception("Mismatched parenthesis!");
        rpn.add(stack.pop());
    }

    return rpn;
}
bitsmcgee77
  • 391
  • 3
  • 10
  • One option is to use a tool like Javaluator, which parses such expressions. If there were a problem while parsing, I believe Javaluator would throw an exception. – Tim Biegeleisen Jan 05 '17 at 03:26
  • You may want to give more information about what, exactly, you're hoping to accomplish. Are you trying to write a compiler in JAVA? – Araymer Jan 05 '17 at 03:26
  • @Araymer Not a compiler, simply some code that validates a UI free-text field, within which a user can enter an expression just like the one above. – bitsmcgee77 Jan 05 '17 at 03:31
  • "I have already written a lexical tokenizer" so what's the output of that for the given example? A stream of tokens right? – weston Jan 05 '17 at 03:33
  • Weston are you implying I can use the lexer output to determine directly if the functions have the right number of parameters? – bitsmcgee77 Jan 05 '17 at 03:40
  • Sure, yes, that'll work. – Louis Wasserman Jan 05 '17 at 03:41
  • If you are managing to convert to postfix, you must have implemented some rudimentary parser as well. So you should already know how many arguments there are for each function - it shouldn't be too hard to look up the required number of arguments from a Map. – Erwin Bolwidt Jan 05 '17 at 03:43
  • Sorry, I see you have already RPN, why don't you show that too? Show where you've got too. – weston Jan 05 '17 at 03:56
  • @weston 2 3 max 3 / 3.1415 * sin 2 + – bitsmcgee77 Jan 05 '17 at 03:58
  • I presume your using [Shunting Yard Algorithm](https://en.wikipedia.org/wiki/Shunting-yard_algorithm)? Finding if the correct number of arguments are there would need to be done at that point. Please show your code. – weston Jan 05 '17 at 03:58
  • @weston Yes, that's right. I added the code to the question. – bitsmcgee77 Jan 05 '17 at 04:08

2 Answers2

1

What you want to do is implement a precise parser, that knows the exact syntax of your language (that includes "how many operators does a function have").

It is easy to write such a parser for expressions. See https://stackoverflow.com/a/2336769/120163

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • I did see that earlier. I'm ashamed to admit I still find it confusing. Could you give me an example in the context of my problem of counting function parameters? – bitsmcgee77 Jan 05 '17 at 04:22
  • You define a grammar that describes your expression. Rules describe various options for expressions. Some rules describe built in functions, such as sin and max. As an example, one might have grammar rules **T = 'sin' '(' exp ')' ;** and **T = 'max' '(' exp ',' exp ')' ;** where **T** is a term in the grammar. These grammar rules implicitly encode the correct number of parameters for the functions. A parser that enforces these rules, will then automatically enforce the correct number of parameters; if the parameter count is wrong, the parser will produce a syntax error. ... – Ira Baxter Jan 05 '17 at 04:29
  • Thanks very much, I will try to uses these sample rules to better understand your other post. I'm curious, what would the rule be if you could have 1+n parameters? – bitsmcgee77 Jan 05 '17 at 04:33
  • 1
    An alternative is to define a grammar rule **T = IDENTIFIER '(' expressions ')' ;** with **expressions = expression ( ',' expression ); ** This will allow the parser to accept sin(2,3) and max(1) which clearly isn't right. You solve this problem by adding an ad hoc check in the logic for the ** ... IDENTIFIER '(' expressions ') ... ** rule that looks up the identifier in a function-to-argument-count table, counts the number of ** exp ** in ** expressions **, and complains when they don't match. You use this technique if you have many, many functions. If you have only a few, use previous. – Ira Baxter Jan 05 '17 at 04:34
  • "n+1" parameters? You mean, the function has n parameters of one style, and one additional one? If n is a constant, you check for n+1 as a constant. Otherwise you can write at rule **T = nplus1args '(' expressions ',' exp ')' ;** and verify that the number of expressions is "n". – Ira Baxter Jan 05 '17 at 04:36
  • Yes I mean simply what would be the case for the function max or avg. You must have at least one param but you can have many, up to n. – bitsmcgee77 Jan 05 '17 at 04:38
  • If you use the syntax for the *nplus1args* rule, you'll get the effect you want. You don't even have to check for "n"; the syntax ensures there is at least one **exp**, and also ensures there is **expressions**, which has at least one **exp**. So you always get at least 2 arguments. – Ira Baxter Jan 05 '17 at 04:42
0

You either need to detect it during the Shunting Yard. A quick idea would be on the operator stack, keep a counter against each element. Count the number of commas detected. Then either on a close parenthesis or just at the end, check the number of arguments against each function entry.

An alternative might be to keep some more of the information as additional values for your RPN. e.g. keep the commas, you then get:

2 , 3 max 3 / 3.1415 * sin 2 +

When processing a function, it not only must eat values from the stack it must also eat the correct number of ,s. And too many will show itself later on.

I fear that way has some edge cases though like this; so probably better a precise parser.

sin(1,2) * max (3)

1 , 2 sin 3 max *
weston
  • 54,145
  • 21
  • 145
  • 203