I've been working on a text editor that uses LPEG to implement syntax highlighting support. Getting things up and running was pretty simple, but I've only done the minimum required.
I've defined a bunch of patterns like this:
-- Keywords
local keyword = C(
P"auto" +
P"break" +
P"case" +
P"char" +
P"int"
-- more ..
) / function() add_syntax( RED, ... )
This correctly handles input, but unfortunately matches too much. For example int
matches in the middle of printf
, which is expected because I'm using "P
" for a literal match.
Obviously to perform "proper" highlighting I need to match on word-boundaries, such that "int" matches "int", but not "printf", "vsprintf", etc.
I tried to use this to limit the match to only occurring after "<[{ \n
", but this didn't do what I want:
-- space, newline, comma, brackets followed by the keyword
S(" \n(<{,")^1 * P"auto" +
Is there a simple, obvious, solution I'm missing here to match only keywords/tokens that are surrounded by whitespace or other characters that you'd expect in C-code? I do need the captured token so I can highlight it, but otherwise I'm not married to any particular approach.
e.g. These should match:
int foo;
void(int argc,std::list<int,int> ) { .. };
But this should not:
fprintf(stderr, "blah. patterns are hard\n");