How to differentiate '-' operator from a negative number for a tokenizer

Question

I am creating an infix expression parser, an so I have to create a tokenizer. It works well, except for one thing: I do not now how to differentiate negative number from the "-" operator.

For example, if I have:

23 / -23

The tokens should be 23, / and -23, but if I have an expression like

23-22

Then the tokens should be 23, - and 22.

I found a dirty workaround which is if I encounter a "-" followed by a number, I look at the previous character and if this character is a digit or a ')', I treat the "-" as an operator and not a number. Apart from being kind of ugly, it doesn't work for expressions like

--56

where it gets the following tokens: - and -56 where it should get --56

Any suggestion?

Well the C grammar does not have negative integer constants it is all unary minus. Seems like that would be the simpler approach. — Shafik Yaghmour, Oct 23 '14 at 13:55
That would normally not be the job of a tokenizer, you'll have to figure this out at the syntax level. — nos, Oct 23 '14 at 13:59

2501 · Accepted Answer · 2014-10-23T14:05:12.413

10

In the first example the tokens should be 23, /, - and 23.

The solution then is to evaluate the tokens according to the rules of associativity and precedence. - cannot bind to / but it can to 23, for example.

If you encounter --56, is split into -,-,56 and the rules take care of the problem. There is no need for special cases.

edited Oct 23 '14 at 14:05

answered Oct 23 '14 at 13:56

2501

25,460
4
47
87

3

@2501: one special case which you might need to worry about is the constant `-(MAXINT+1)`, which is MININT in 2s complement architectures, but involves a constant too large for a positive integer. That means that the parser rather than the lexer needs to be responsible for error-checking, which must be done as part of constant folding in order to correctly flag out-of-bounds integer literals. – rici Oct 23 '14 at 17:11

How to differentiate '-' operator from a negative number for a tokenizer

1 Answers1

Linked