how to differentiate between 'int' and 'float' datatypes while using SLY in Python

Question

I am using Sly package in Python to make a dummy Lexer. here is the documentation which only states that tokens which are numbers can only be identified as NUMBER as we, the programmer defines it:

@_(r'\d+')
def NUMBER(self, t):
    t.value = int(t.value)
    return t

Now I have tried adding a separate function for a FLOAT and INT type instead, but there are some complications:

I can't find anything to successfully read the input and identify it as a 'float' or an 'int' data type. Experimented with various things it 'yeilds', no luck.

I am now thinking of manipulating RegEx somehow but no luck in that either. I can't make it recognize a float by placing a dot between two digits like this:

@_(r'\d+.\d+') # this Expression
def FLOAT(self, t):
    t.value = int(t.value)
    return t

Anyways, the output I want is something like this:

#input string
123 1.23
#output string (this is a simplified version of the original output)
TOKEN:123; ID:0; TYPE:int-datatype
TOKEN:1.23; ID:1; TYPE:float-datatype

NOTE: I am very new to Sly

`([0-9]*\.[0-9]+|[0-9]+\.?)([Ee][+-]?[0-9]+)?` Try this regex for floats, unsure if it works with the way Sly forwards numbers so let me know if it does and I can post it as an answer with a nexplanation. — Skully, Oct 25 '21 at 19:31
This is the output: `>> 123 Token(type='INT', value=123, lineno=1, index=0) >> 1.23 Token(type='INT', value=1, lineno=1, index=0)` and it also throws an **ValueError** exception: `ValueError: invalid literal for int() with base 10: '.23'` I guess we will have to write identification RegExs for int as well as float part of a complete float number. Then join them afterwards. At this point anything goes. — Muhammad Hammad Hassan, Oct 25 '21 at 19:47
Have you tried changing the order in which you declare your INTEGER and FLOAT functions? The first to be declared will be tried first. So FLOAT then INTEGER means it will try to parse tokens as floats in preference to integers (rather than vice versa). Not sure if the is a conscious design decision or just some implementation detail though. — Dunes, Oct 25 '21 at 21:39
The documentation indicates that it is intentional. "Tokens are matched in the same order that patterns are listed in the Lexer class. " — Sean Duggan, Aug 28 '22 at 06:07

score 1 · Answer 1 · answered Sep 08 '22 at 14:27

Just to quickly compile the details in comments into a visible answer:

([0-9]*\.[0-9]+|[0-9]+\.?)([Ee][+-]?[0-9]+)? will match a float value
You would have to check for the float value first because Sly goes through lexing rules in order (also why, if you have a == literal and a = literal, you want to list them in that order rather than the reverse).

how to differentiate between 'int' and 'float' datatypes while using SLY in Python

1 Answers1