What you're looking for is, well, parsing.
It's not about accepting/rejecting "good" or "bad" spaces. It is about trying to recognize what's entered, and rejecting it if you can't.
In this case, let's start with a (thoroughly simplified) grammar for the statement in question:
select_statement ::= 'select' field_list 'from' table
So, you read in the first token. If it's SE
or SELECTa
, you reject the statement as invalid, because neither of those fits your grammar. Almost any decent parser generator (including, but certainly not limited to, Spirit) makes this fairly trivial--you specify what is acceptable, and what to do if the input is not acceptable, and it deals with invoking that for input that doesn't fit the specified grammar.
As for how you do the tokenization to start with, it's typically pretty simple, and usually can be based on regular expressions (e.g., many languages have been implemented using lex
and derivatives like Flex, which use regexen to specify tokenization).
For something like this, you directly specify the keywords for your language, so you'd have something that says when it matches 'select', it should return that as a token. Then you have something more general for an identifier that typically runs something like `[_a-zA-Z][_a-zA-Z0-9]*' ("an identifier starts with an underscore or letter, followed by an arbitrary number of underscores, letters, or digits"). In the cases above, this would be entirely sufficient to find and return the "SE" and "SELECTa" as the first tokens in the "bad" examples.
Your parser would then detect that the first thing it received was an identifier instead of a key word, at which point it would (presumably) be rejected.