1

Is there a way to special-case a ply lexer rule?

t_IDENT     = r'[a-zA-Z_][0-9a-zA-Z_]*'
t_OPERATOR  = r'[<>=/*+-]+'
t_DEFINE    = r'='
t_PRODUCES  = r'=>'

I want to define an operator as any combination of the listed characters, except that = and => have their own special cases. For example:

a + b
# IDENT('a') OPERATOR('+') IDENT('b') 

a ++=--> b
# IDENT('a') OPERATOR('++=-->') IDENT('b') 

a == b
# IDENT('a') OPERATOR('==-->') IDENT('b') 

a => b
# IDENT('a') PRODUCES('=>') IDENT('b') 

a = b
# IDENT('a') DEFINE('=') IDENT('b') 

a >= b
# IDENT('a') OPERATOR('>=') IDENT('b') 

a <=> b
# IDENT('a') OPERATOR('<=>') IDENT('b') 
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • possible duplicate of [Ply Lex parsing problem](http://stackoverflow.com/questions/5022129/ply-lex-parsing-problem) – Jason S Jul 20 '14 at 23:14
  • Never mind, this is essentially the same as the reserved word problem in the above SO question, also in the docs here: http://www.dabeaz.com/ply/ply.html#ply_nn6 – Jason S Jul 20 '14 at 23:15

2 Answers2

2

Yes, the reason you get OPERATOR tokens instead of expected PRODUCES/DEFINE is token precedence rules of PLY lexer:

Internally, lex.py uses the re module to do its patten matching. When building the master regular expression, rules are added in the following order:

  1. All tokens defined by functions are added in the same order as they appear in the lexer file.
  2. Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

Just convert certain rules into functions:

def t_DEFINE(t):
    r'='
    return t

def t_PRODUCES(t):
    r'=>'
    return t
Eldar Abusalimov
  • 24,387
  • 4
  • 67
  • 71
0

I removed the automated t_DEFINE and t_PRODUCES rules and used the reserved word technique to handle the special cases:

special_operators = {'=': 'DEFINE',
                     '=>': 'PRODUCES'}

def t_OPERATOR(t):
    r'[<>=/*+-]+'
    t.type = special_operators.get(t.value, t.type)
    return t
Jason S
  • 184,598
  • 164
  • 608
  • 970