0
buf=[]
token=[]
def lex():

def read():
    
    f=open('lex.txt','r')
    data=f.readline()
    data=data.split(' ')
    return data

lex=read()

def operator(i):
    
    op=['+','-','/','*','<','>','>=','<=']
    
    if i in op:
        buf.append(i)
        token.append('RELOP')
        return True
def error(i):
    
    digit=[0,1,2,3,4,5,6,7,8,9,0]
    
    try:
    
        if i[0] in digit:
            buf.append(i)
            token.append('ERROR')
            return True
    except:
        pass
    
def keyword(i):
    
    keyword=['if','while','for']
    
    if i in keyword:
        buf.append(i)
        x=i.upper()
        token.append(x+'_TOKEN')
        return True
        
def ident(i):
    
    alph=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','q','r','s','t','u','v','w','x','y','z']
    
    try:
        
        if i[0] in alph:
            buf.append(i)
            token.append('ID')
            return True
    
    except:
        pass
        
    
def floati(i):
    
    try:
        res = i.replace('.', '', 1).isdigit()
        if res:
            buf.append(i)
            token.append('FLOAT')
            return True
    
    except:
        pass
        
        
        
def integ(i):
    
    try:
        x=int(i)
        buf.append(i)
        token.append('INTEGER')
        return True
    except:
        pass

count=0

for i in lex:
    
    if keyword(i):
        
        print(token[count])
        
        continue
    
    elif operator(i):
        
        print(token[count])
        continue
    elif error(i):
        
        print(token[count])
        continue
    elif floati(i):
        
        print(token[count])
        continue
    elif integ(i):
        
        print(token[count])
        continue
    elif ident(i):
        
        print(token[count])
        continue
    else:
        
        print("Zaaa anan xd")
    
    count+1
    

lex()
    
  

I am trying to write a lexical analyzer in Python. I want to check an input string from a file for a float or integer, but when I use float() my code changes the string type of integer to float and returns as float. I do not want to check string using float() or int(). I want to append the string to an array without changing it.

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
Azqaf
  • 17
  • 5
  • What's in `lex.txt`, and what is the expected output of your code? – BrokenBenchmark Apr 14 '22 at 20:48
  • Does this answer your question? https://stackoverflow.com/questions/4843173/how-to-check-if-type-of-a-variable-is-string – Leon Menkreo Apr 14 '22 at 20:48
  • The `lex` function is empty. Is the rest of it supposed to be nested functions? Please fix the indentation. – Barmar Apr 14 '22 at 20:48
  • Lexical analyizers usually use regular expressions in their parsers to recognize different types of literals. – Barmar Apr 14 '22 at 20:49
  • @LeonMenkreo - that's a different question - that simply reports whether it is type "string". THIS question is whether the string is a valid representation of a number. – ToolmakerSteve Apr 14 '22 at 20:49
  • x 45 5.4 -33 size33 34RR if <= while x this is the inputs – Azqaf Apr 14 '22 at 20:51
  • I have always found this an interesting topic. Do you intend to require that your tokens be separated by whitespace, so you can use `.split`? If so, then using regexes on those tokens is easy: `r"\d+"` for integers, `r"\d+(\.\d*)?(E\d+)?"` for floats. There are several good lexical parser modules for Python. – Tim Roberts Apr 14 '22 at 20:52
  • You should modify the question to include your sample input. Presumably, "34RR" is a syntax error. – Tim Roberts Apr 14 '22 at 20:52
  • 2
    *"I do not want to check string using float() or int()"* - why not? You don't have to *use* the result of the conversion - just check whether it throws an error or not. Define a function that does the check - similar to https://stackoverflow.com/q/354038/199364 - but add a check for int also. – ToolmakerSteve Apr 14 '22 at 20:53
  • @TimRoberts Sir i do not want to use regexes. – Azqaf Apr 14 '22 at 20:54
  • Note that `alph = "abcdefghijkl..."` is a lot easier to type than `alph = ['a','b','c','d',...]` and works exactly the same. – Tim Roberts Apr 14 '22 at 20:55
  • it converts integer to float when i want to check it. – Azqaf Apr 14 '22 at 20:56
  • @ToolmakerSteve sir when i want to check integer type of string float or not, it converts integer to float. – Azqaf Apr 14 '22 at 20:57
  • I don't understand the problem. If you want an integer to be seen as an integer, then you check for integer FIRST - treat it as an integer in that case. – ToolmakerSteve Apr 14 '22 at 21:02
  • You must check for integer BEFORE you check for float, otherwise all integers will be seen as floats. And I don't know what your "error" is doing; you will never get any true integers in this process. – Tim Roberts Apr 14 '22 at 21:04

4 Answers4

1

A simple way to check if a string is int or float:

def is_int(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

Then:

>>> is_int('-73')
True

>>> is_int('2.3')
False

But note, the above is_int() only checks for int. If not, it is not necessarily a float:

>>> is_int('foo')
False

But you could easily duplicate that pattern for float.

Pierre D
  • 24,012
  • 7
  • 60
  • 96
0

The str module offers the isdigit function, returning True if a string represents a digit.

str.isdigit("5")
# True
str.isdigit("five")
# False

For float type, you could write a short function using try-except:

def check_float(string):
    try:
        float(string)
        return True
    except ValueError:
        return False
Leon Menkreo
  • 129
  • 4
0
strs = ["x", "45", "5.4", "-33", "size33", "34RR", "if"]
from re import match
strs = ["18","18.5","s","-35"]
for s in strs:
    if match("^-{0,1}[0-9^]*(\.[0-9]+){0,1}$",s):
        if s[0] == "-":
            s = s[1:]
        if s.isdigit():
            print("int")
        else:
            print("float")
    else:
        print("other")
MoRe
  • 2,296
  • 2
  • 3
  • 23
0

I think this does what you want, without using regexes. Notice that this handles negative numbers, which yours did not. I've removed the called to error(i), since it was pointless.

def read():
    return open('lex.txt','r').read().split()

op=['+','-','/','*','==','!=','<','>','>=','<=']
def operator(i):
    if i in op:
        buf.append(i)
        token.append('RELOP')
        return True
    return False

# I don't know what this is doing.  `i` is a string.
def error(i):
    digit=[0,1,2,3,4,5,6,7,8,9,0]
    if i[0] in digit:
        buf.append(i)
        token.append('ERROR')
        return True
    return False
    
keywords=['if','while','for']
def keyword(i):
    if i in keywords:
        buf.append(i)
        x=i.upper()
        token.append(x+'_TOKEN')
        return True
    return False
        
alph='abcdefghijklmnopqrstuvwxyz'
def ident(i):
    if i[0] in alph:
        buf.append(i)
        token.append('ID')
        return True
    return False
        
def floati(i):
    tmp = i
    if tmp[0] in "+-":
        tmp = tmp[1:]
    if tmp.replace('.', '', 1).isdigit():
        buf.append(i)
        token.append('FLOAT')
        return True
    return False
        
def integ(i):
    if i.isdigit() or i[0] in "+-" and i[1:].isdigit():
        buf.append(i)
        token.append('INTEGER')
        return True
    return False

def parse(lex):

    for i in lex:
        if keyword(i) or \
            operator(i) or \
            integ(i) or \
            floati(i) or \
            ident(i):
            print(i, token[-1])

        else:
            print(i, "Syntax error")

buf=[]
token=[]
parse(read())
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30