I'm writing a simple script parser in Javascript, and for the tokenizing part of the lexer I wanted to use Regex.
There are certain tokens I'm looking for, like (including the quotes):
- "last-name"
- "first-name"
- "staff-id"
I also look for horizontal whitespace and vertical whitespace.
Finally, I look for whatever else is not matched by those tokens and white spaces.
The Regex would look something like:
("last-name"|"first-name"|"staff-id")|([\t ]+)|([\r\n]+)|(.+?)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^
tokens whitespace catch-all
But I'm having a problem with the "catch-all" at the end: (.+?)
The resulting capture for this last part is one character each.
What I wanted to do was capture everything together in that catch-all instead of one character at a time. I've Googled around and looked at stackoverflow answers, like the following:
- Match a specific sequence or everything else with regex
- How do I match everything except matched value?
One solution I can do is concatenate all the "catch-all" results, one character at a time. For this particular project, that's fine, but for another one I'd rather have a Regex solution that could capture everything else in a "catch-all", if that's even possible.
So how can I capture "everything else" that I haven't already matched in a Regex?