8

In SLY there is an example for writing a calculator (reproduced from calc.py here):

from sly import Lexer

class CalcLexer(Lexer):
    tokens = { NAME, NUMBER }
    ignore = ' \t'
    literals = { '=', '+', '-', '*', '/', '(', ')' }

    # Tokens
    NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

    @_(r'\d+')
    def NUMBER(self, t):
        t.value = int(t.value)
        return t

    @_(r'\n+')
    def newline(self, t):
        self.lineno += t.value.count('\n')

    def error(self, t):
        print("Illegal character '%s'" % t.value[0])
        self.index += 1

It looks like it's bugged because NAME and NUMBER are used before they've been defined. But actually, there is no NameError, and this code executes fine. How does that work? When can you reference a name before it's been defined?

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
COVFEFE-19
  • 293
  • 1
  • 8
  • 1
    `Lexer` uses a custom metaclass that handles missing variables. – Barmar Feb 02 '21 at 20:25
  • 4
    There is some *deep* magic going on here - `Lexer` has a metaclass that redefines how its subclasses get defined. Specifically, the subclasses are no longer using a standard Python dict for their namespace, they're using a dict subclass that automatically defines names (as a string equal to the name) on lookup. – jasonharper Feb 02 '21 at 20:25

1 Answers1

6

Python knows four kinds of direct name lookup: builtins / program global, module global, function/closure body, and class body. The NAME, NUMBER are resolved in a class body, and as such subject to the rules of this kind of scope.

The class body is evaluated in a namespace provided by the metaclass, which can implement arbitrary semantics for name lookups. In specific, the sly Lexer is a LexerMeta class using a LexerMetaDict as the namespace; this namespace creates new tokens for undefined names.

class LexerMetaDict(dict):
    ...
    def __getitem__(self, key):
        if key not in self and key.split('ignore_')[-1].isupper() and key[:1] != '_':
            return TokenStr(key, key, self.remap)
        else:
            return super().__getitem__(key)

The LexerMeta is also responsible for adding the _ function to the namespace so that it can be used without imports.

class LexerMeta(type):
    '''
    Metaclass for collecting lexing rules
    '''
    @classmethod
    def __prepare__(meta, name, bases):
        d = LexerMetaDict()

        def _(pattern, *extra):
            ...

        d['_'] = _
        d['before'] = _Before
        return d
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119