Thinking about parsing regular expressions using yacc (I'm actually using PLY), some of the rules would be like the following:
expr : expr expr
expr : expr '|' expr
expr : expr '*'
The problem is, the first rule(concatenation) must take precedence over the second rule, but not the third one.
However, the concatenation rule has no operator in it.
How can I specify the precedence correctly in this case?
Thank you!
EDIT:
I modified the rules to avoid the issue, but I'm still curious what was the problem.
Here is the source code:
tokens = ['PLEFT', 'PRIGHT', 'BAR', 'ASTERISK', 'NORMAL']
t_PLEFT = r'\('
t_PRIGHT = r'\)'
t_BAR = r'\|'
t_ASTERISK = '\*'
t_NORMAL = r'[a-zA-Z0-9]'
lex.lex()
precedence = (
('left', 'BAR'),
('left', 'CONCAT'),
('left', 'ASTERISK'),
)
def p_normal(p):
'''expr : NORMAL'''
p[0] = p[1]
def p_par(p):
'''expr : PLEFT expr PRIGHT'''
p[0] = p[2]
def p_or(p):
'''expr : expr BAR expr'''
p[0] = ('|', p[1], p[3])
def p_concat(p):
'''expr : expr expr %prec CONCAT'''
p[0] = ('CONCAT', p[1], p[2])
def p_repeat(p):
'''expr : expr ASTERISK'''
p[0] = ('*', p[1])
yacc.yacc()
Its parsing result of 'ab|cd*'
is ('CONCAT', ('|', ('CONCAT', 'a', 'b'), 'c'), ('*', 'd'))
.