0

I am trying to parse a file using Ply. I need to be able to recognize the expression format "a = 1;" (name equals number semicolon). Everything works fine as long as there is only one of the expressions per line in the input (a = 1;), but yacc gives an error when there are multiple expressions per line (a = 1; b = 2;). I have confirmed that everything is getting tokenized correctly so I am not sure what the issue is.

Here is a minimal version of my code:

tokens = (
    'EQUALS',
    'SEMICOLON',
    'NAME',
    'VALUE',
)
t_ignore = ' \t'
def t_EQUALS(t):
    r"""="""
    return t
def t_SEMICOLON(t):
    r""";"""
    return t
def t_NAME(t):
    r"""\S+\s*(?==)"""
    t.value = t.value.strip()
    return t
def t_VALUE(t):
    r"""(?<==)[^;]+"""
    t.value = t.value.strip()
    return t
def t_newline(t):
    r"""\n"""
    t.lexer.lineno += len(t.value)
def t_error(t):
    print("Illegal character '%s' on line %d" % (t.value[0], t.lineno))
    t.lexer.skip(1)
def t_eof(t):
    return None

lexer = lex.lex()

def p_expression(p):
    '''
    expression : item
               | empty
    '''
    p[0] = p[1]
def p_item(p):
    '''
    item : NAME EQUALS VALUE SEMICOLON
    '''
    p[0] = ('name', p[1], p[3])
def p_error(p):
    print('Syntax error' + str(p) + " Line " + str(p.lineno))

parser = yacc.yacc()

for line in file:
    print(parser.parse(line))
Eric
  • 3
  • 3
  • Nothing in that grammar recognises a semicolon. And `expression` (which is the start symbol) only matches a single item. If you want to match multiple items separated by semicolons, you'll need to add a syntax for that. – rici May 29 '21 at 17:54
  • There are lots of duplicates, but I don't have time to find one right now. – rici May 29 '21 at 17:55
  • Here's one: https://stackoverflow.com/questions/62901425/why-yacc-can-not-parse-a-second-line-of-a-grammar-rule-even-when-it-parses-corre/62903750#62903750 – rici May 29 '21 at 20:35
  • I am not sure what you mean by "nothing in that grammar recognises a semicolon." I have a semicolon token and a grammar rule that use the semicolon tokens. I also took the 'duplicate' posting and it does not solve the issue I am having. The proposed solution in that posting just causes yacc to throw unknown conflict errors. – Eric Jun 01 '21 at 15:53
  • sorry, now I see the semicolon in your grammar. But you still only recognise a single item in `expression` (or nothing), so the parser will parse at most one `item`. I suppose you attempted something like `program: | program expression`. That won't work because `expression` can be empty. You need something like `program: | program item`. Adjust as necessary. – rici Jun 01 '21 at 16:58
  • Or remove the empty alternative from `expression`. Although it's not clear to me what value `expression` is adding, unless you have some plans to expand on it. – rici Jun 01 '21 at 17:04

0 Answers0