I'm currently writing my own parser for a fictional Assembly language. The instructions are very similar to any normal assembly instruction:
[INSTRUCTION] [OP]*
where op can be 0-3 operands. I want to be able to use an expression that matches this. This is being written in C++ with boost::regex. I myself am a regexp noobie, trying to understand the boost documentation of what each symbol does.
Now, I already have an expression that can match 0-3 operands like so:
Sample Instructions:
MOVI 8 10
ADDI 8 8 10
NOP
BNEZI -1
Expression: ^([a-z]+)( ([-,0-9]+))*
However, I can't create a suitable expression that handles the same instructions when comma-delimited:
Sample Instructions:
MOVI 8, 10
ADDI 8, 8, 10
This is really tripping me up. I tried rewriting my expression like so:
^([a-z]+)( ([-,0-9]+))*(, ([-,0-9]+))*
This looks to be extremely green, poor regexp. It also isn't working correctly. I was thinking of using a recursive expression, but I looked at the documentation and I might as well scribble "overkill" on my forehead.
I realize I could just format the line to take out all the commas, but I would rather like to be able to write and understand a regexp expression first, then do it the easy way. Any help would be appreciated.