3

im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token. data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD.

data = 'Keyword1 Keyword2 Keyword3 Keyword4'
def t_KEYWORD(t):
    # ... r'\$' + data ??
    return t

text = '''
Some test data


even more

$var = 2231




$[]Test this 2.31 + / &
'''

autoit = lex.lex()
autoit.input(text)
while True:
    tok = autoit.token()
    if not tok: break
    print(tok)

So i was trying to add the variable to that regex, but it didnt work. I'm always gettin: No regular expression defined for rule 't_KEYWORD'.

Thank you in advance! John

Sean M.
  • 595
  • 1
  • 4
  • 21
  • What code did you use to add it to the regex (show the code that actually raises the error) – David Robinson Aug 31 '12 at 14:52
  • I still don't follow. That line is commented out. Can you show an example that actually throws that error? – David Robinson Aug 31 '12 at 14:56
  • well just use my code from above or here: `data = 'Keyword1 Keyword2 Keyword3 Keyword4' def t_KEYWORD(t): r'\$' + data return t` – Sean M. Aug 31 '12 at 14:58
  • That code doesn't throw an exception. Where is the line that actually uses the `t_KEYWORD` in the regex? – David Robinson Aug 31 '12 at 15:00
  • Thats all i get: `ERROR: /Users/John/Lexer/lexer.py:21: No regular expression defined for rule 't_KEYWORD' Traceback (most recent call last): File "/Users/John/Lexer/lexer.py", line 77, in autoit = lex.lex() File "/Library/Frameworks/Python.framework/Versions/3.0/lib/python3.0/site-packages/ply-3.4-py3.0.egg/ply/lex.py", line 894, in lex raise SyntaxError("Can't build lexer") SyntaxError: Can't build lexer` – Sean M. Aug 31 '12 at 15:02
  • Where is the line `autoit = lex.lex()` that is throwing the traceback? It's not in the code that's provided. The code you provide is just defining a function and never actually does anything with regular expressions or `ply.lex` – David Robinson Aug 31 '12 at 15:04
  • 2
    Okay, let's back up a second. `ply` already has a decorator -- `TOKEN` -- to do some of the docstring magic people are suggesting. See [here](http://www.dabeaz.com/ply/ply.html#ply_nn14), for example. But I'm not sure if you want to construct 4 separate tokens and have each of them recognized separately (which this wouldn't do anyway), or if you have one keyword with four variations, or what. Could you edit your post to be a little more specific? – DSM Aug 31 '12 at 15:30
  • Okay, edited my post above. Hope that makes it more understandable. – Sean M. Aug 31 '12 at 15:41

3 Answers3

3

As @DSM suggests you can use the TOKEN decorator. The regular expression to find tokens like cat or dog is 'cat|dog' (that is, words separated by '|' rather than a space). So try:

from ply.lex import TOKEN
data = data.split() #make data a list of keywords

@TOKEN('|'.join(data))
def t_KEYWORD(t):
    return t
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
2

ply.lex uses the docstring for the regular expression. Notice the order which you define tokens defines their precedence, which this is usually important to manage.

.

The docstring at the top cannot be an expression, so you need to do this token definition by token definition.

We can test this in the interpreter:

def f():
    "this is " + "my help"  #not a docstring :(
f.func_doc #is None
f.func_doc = "this is " + "my help" #now it is!

Hence this ought to work:

def t_KEYWORD(token):
    return token
t_KEYWORD.func_doc=r'REGULAR EXPRESSION HERE' #can be an expression
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Could you define an empty function, and modify `f.__doc__ = 'my' + 'regex'` ? – dbr Aug 31 '12 at 15:07
  • hm i've about 1000 keywords, its impossible to do `token definition by token definition`. – Sean M. Aug 31 '12 at 15:13
  • @JohnSmith updated with a possible fix! This *should* work, please give it a try. – Andy Hayden Aug 31 '12 at 15:15
  • I've tried: `def t_KEYWORD(t): return t t_KEYWORD.func_doc=r'\d+'` but still the some error – Sean M. Aug 31 '12 at 15:23
  • 1
    @JohnSmith it seems like ply.lex reads the docstring immediately, possibly even at runtime: but I asked [this question](http://stackoverflow.com/questions/12218578/setting-the-docstring-to-an-expression-inside-def). – Andy Hayden Aug 31 '12 at 15:39
0

Not sure if this works with ply, but the docstring is the __doc__ attribute of a function so if you write a decorator that takes a string expression and sets that to the __doc__ attribute of the function ply might use that.

James Thiele
  • 393
  • 3
  • 9