The problem you face with going the Regex route is you run into issues with spaces. There is probably a really complicated Regex to do this, but for a simple regex you'll find your searches can't contain spaces for keywords, e.g.:
Works: site:mysite user:john
Fails: site:"my awesome site" user:john
This will fail because it is tokenizing based on spaces. So if space support is a requirement read on...
I would recommend either using the Lucene .NET engine's inbuilt parser to give you the tokens, or using a grammar and a parser such as GoldParser, Irony or Antlr.
It might sound too long and complicated for what you want, but having written a grammar for GoldParser to do exactly what you're doing, it is actually quite easy once the grammar is done. Here's an example of the grammar:
"Name" = 'Spruce Search Grammar'
"Version" = '1.1'
"About" = 'The search grammar for Spruce TFS MVC frontend'
"Start Symbol" = <Query>
! -------------------------------------------------
! Character Sets
! -------------------------------------------------
{Valid} = {All Valid} - ['-'] - ['OR'] - {Whitespace} - [':'] - ["] - ['']
{Quoted} = {All Valid} - ["] - ['']
! -------------------------------------------------
! Terminals
! -------------------------------------------------
AnyChar = {Valid}+
Or = 'OR'
Negate = ['-']
StringLiteral = '' {Quoted}+ '' | '"' {Quoted}+ '"'
! -- Field-specific terms
Project = 'project' ':'
...
CreatedOn = 'created-on' ':'
ResolvedOn = 'resolved-on' ':'
! -------------------------------------------------
! Rules
! -------------------------------------------------
! The grammar starts below
<Query> ::= <Query> <Keywords> | <Keywords>
<SingleWord> ::= AnyChar
<Keywords> ::= <SingleWord>
| <QuotedString>
| <Or>
| <Negate>
| <FieldTerms>
<Or> ::= <Or> <SingleWord>
| Or Negate
| Or <SingleWord>
| Or <QuotedString>
<Negate> ::= <Negate> Negate <SingleWord>
| <Negate> Negate <QuotedString>
| Negate <SingleWord>
| Negate <QuotedString>
<QuotedString> ::= StringLiteral
<FieldTerms> ::= <FieldTerms> Project | <FieldTerms> Description | <FieldTerms> State
| <FieldTerms> Type | <FieldTerms> Area | <FieldTerms> Iteration
| <FieldTerms> AssignedTo | <FieldTerms> ResolvedBy
| <FieldTerms> ResolvedOn | <FieldTerms> CreatedOn
| Project
| <Description>
| State
| Type
| Area
| Iteration
| CreatedBy
| AssignedTo
| ResolvedBy
| CreatedOn
| ResolvedOn
<Description> ::= <Description> Description | <Description> Description StringLiteral
| Description | Description StringLiteral
This gives you search support for something like:
resolved-by:john project:"amazing tfs project"
If you look at the Keywords
token, you can see it is expecting a singleword, an OR, a quoted string, or a negative (a NOT). The hard part comes when this definition becomes recursive, which you see in the <Description>
part.
The syntax is called EBNF which describes the format of your language. You can write something as simple as a search query parser in it, or an entire computer language. The way Goldparser parses the tokens will restrict you, as it looks ahead for tokens (LALR), so languages such as HTML and Wiki syntax will break any grammar you attempt to write, as these formats don't force you to close tags/tokens. Antlr gives you LL(*) which is more forgiving of missing start tags/tokens but isn't something you don't need to worry about for a search query parser.
The code folder for my grammar and C# code can be found in this project.
QueryParser is the class that parses the search string, the grammar file is the .grm file, the 2mb file is how Goldparser optimises your grammar to basically create its own table of possibilities. Calitha is the C# library for GoldParser, and is easy enough to implement. Without writing an even larger answer it's hard to describe exactly how it's done, but it's fairly straightforward once you have compiled the grammar, and Goldparser has a very intuitive IDE for writing grammars with and a huge set of existing ones such as SQL, C#, Java and even a Perl regex I believe.
It's not a 1 hour quick fix as you'd get from a regex though, closer to 2-3 days, however you do learn the 'proper' way of parsing.