I'm trying to create a compiler in python and I'm using the re
module to create tokens. The language will be very similar to Assembly
.
Almost everything is working, but I'm having trouble with a token. Let me give an example of what would be this token:
mov [eax], 4
mov [name],2
mov eax, [ebx]
Tokens: [eax], [ebx]
I can find what I want using this pattern: \[(eax|ebx)\]
But I get an error when use with other patterns, I believe it is because of the '|'.
SCANNER = re.compile(r"""
;(.)* # comment
|(\[-?[0-9]+\]) # memory_int
|(\[-?0x[0-9a-fA-F]+\]) # memory_hex
|(\[(eax|ebx)\]) # memory access with registers
""", re.VERBOSE)
for match in re.finditer(SCANNER, lines[i]):
comment, memory_int, memory_hex, memory_reg = match.groups()
Error:
ValueError: too many values to unpack (expected 4)
Is there any way to replace the '|'
with another character?