7

I am creating an infix expression parser, an so I have to create a tokenizer. It works well, except for one thing: I do not now how to differentiate negative number from the "-" operator.

For example, if I have:

23 / -23

The tokens should be 23, / and -23, but if I have an expression like

23-22

Then the tokens should be 23, - and 22.

I found a dirty workaround which is if I encounter a "-" followed by a number, I look at the previous character and if this character is a digit or a ')', I treat the "-" as an operator and not a number. Apart from being kind of ugly, it doesn't work for expressions like

--56

where it gets the following tokens: - and -56 where it should get --56

Any suggestion?

Brendan Rius
  • 610
  • 9
  • 18
  • 3
    Well the C grammar does not have negative integer constants it is all unary minus. Seems like that would be the simpler approach. – Shafik Yaghmour Oct 23 '14 at 13:55
  • 2
    That would normally not be the job of a tokenizer, you'll have to figure this out at the syntax level. – nos Oct 23 '14 at 13:59

1 Answers1

10

In the first example the tokens should be 23, /, - and 23.

The solution then is to evaluate the tokens according to the rules of associativity and precedence. - cannot bind to / but it can to 23, for example.

If you encounter --56, is split into -,-,56 and the rules take care of the problem. There is no need for special cases.

2501
  • 25,460
  • 4
  • 47
  • 87
  • 3
    @2501: one special case which you might need to worry about is the constant `-(MAXINT+1)`, which is MININT in 2s complement architectures, but involves a constant too large for a positive integer. That means that the parser rather than the lexer needs to be responsible for error-checking, which must be done as part of constant folding in order to correctly flag out-of-bounds integer literals. – rici Oct 23 '14 at 17:11