3

My problem can be broken down to the following which can happen inside a large regex: 1. is a number, but 1.. are two token consisting of 1 as number and .. as an operator.

The definition of a number in the Wolfram Language is very complex (I append the JFlex code at the end) and I basically need the (?!...) operator in a deeply nested construct. However, JFlex seems to support negative lookahead only on "Rule"-basis which means I would need to expand my definitions manually.

So what want is that numbers don't eat the ., when it is followed by another ., because in the Wolfram Language, the two dots are then parsed as an operator sigh.

I have prepared an example that basically shows the entire number representation as a normal regex, has the negative look-ahead included and contains example-numbers.

Can someone tell me how I can do this in JFlex?

img

Here is the relevant JFlex code and the full definitions is available here

Digits = [0-9]+
Digits2 = [0-9a-zA-Z]+
Base = 2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36
Number = {Digits}((\.){Digits}?)? | \.{Digits}
PrecisionNumber = {Number}`((`?){Number})?
BaseNumber = {Base} "^^" {Digits2}(\.{Digits2}?)?
BasePrecisionNumber = {BaseNumber}((`{Number}?)|(``{Number}))
ScientificInteger = {Number} "\*^"(-?){Digits}
ScientificNumber = {PrecisionNumber} "\*^"(-?){Digits}
BaseScientificNumber = {BasePrecisionNumber} "\*^"(-?){Digits}

{BaseScientificNumber}|
{BasePrecisionNumber}|
{ScientificInteger}|
{BaseNumber}|
{ScientificNumber}|
{PrecisionNumber}|
{Number}            { return WLElementTypes.NUMBER; }
halirutan
  • 4,281
  • 18
  • 44

1 Answers1

3

It's unclear if that is feasible in your case, but my first reaction to this kind of problem would usually be to try and shift it to one level up from the lexer. I.e. instead of a lexer token NUMBER, I'd return the constituents of a number, e.g. {Digits}, ".", "^^", etc, and then put them together in either the grammar of the parser (if there is one), or otherwise in the parsing engine that calls the lexer.

A usual LR or LL engine on top can deal much better with look-ahead and context, i.e. in your example everything below Base might already go into the parser instead of the lexer.

At least if you want to compute further with the value of the number, you'd need to analyse the matched text for a number in more detail anyway, because it is so complex, so from that angle you wouldn't lose anything.

lsf37
  • 535
  • 2
  • 7
  • Thank you very much for your answer. Returning the single parts of my numbers and letting the parser take care was my fallback-plan :) – halirutan Feb 28 '19 at 10:51