Lets say we are matching FIND files where file2=29 AND file32="12" OR file623134="file23"
By way of explanation I'll do this in steps.
Obviously a regex that exactly matches the string would match.
FIND files where file2=29 AND file32="12" OR file623134="file23"

First lets decide what bits we want to read from it... and make them accessable.
FIND (files) where file(2)=(29) AND file(32)=("12") OR file(623134)=("file23")

Here we stick brackets around all the bits that we want to read out. This defines those bits as "capture groups". In C# we can give them names. We will do that later.
Now... lets generalize this regex so it matches more examples.. the keys are numbers, so we can capture them with [0-9]+
. This means match a character in the range 0 to 9, at least once
.
FIND (files) where file([0-9]+)=(29) AND file([0-9]+)=("12") OR file([0-9]+)=("file23")
Ok.. now the values... some here are strings.. lets match those...
a string is stuff that is not a "
surrounded by '"'s or "[^"]+"
(Note.. the plus means we can't match empty strings as we need at least one character. a *
would let you match empty strings.)
FIND (files) where file([0-9]+)=(29) AND file([0-9]+)=("[^"]+") OR file([0-9]+)=("[^"]+")
![FIND (files) where file([0-9]+)=(29) AND file([0-9]+)=("[^"]+") OR file([0-9]+)=("[^"]+")](../../images/3836939434.webp)
One of the values in this example is a number.. so lets assumes they can be intergers.
FIND (files) where file([0-9]+)=([0-9]+) AND file([0-9]+)=("[^"]+") OR file([0-9]+)=("[^"]+")
![FIND (files) where file([0-9]+)=([0-9]+) AND file([0-9]+)=("[^"]+") OR file([0-9]+)=("[^"]+")](../../images/3805547626.webp)
Nothing makes the first example special.. so lets assume all values could be strings or integers. To make two options we use the |
option matcher. (Now.. I guess you are yelling at the screen "No they can be anything... not just strings and numbers" but that's ok. I'll deal with that later too.)
FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+) AND file([0-9]+)=("[^"]+"|[0-9]+) OR file([0-9]+)=("[^"]+"|[0-9]+)
![FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+) AND file([0-9]+)=("[^"]+"|[0-9]+) OR file([0-9]+)=("[^"]+"|[0-9]+)](../../images/3811511406.webp)
Now... we have a fair bit of duplication here... the last parts are the same except one has "OR" and the other has "AND". This is significant.. we want to know what operator
is being used... so lets capture that too.
FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+) (AND) file([0-9]+)=("[^"]+"|[0-9]+) (OR) file([0-9]+)=("[^"]+"|[0-9]+)
![FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+) (AND) file([0-9]+)=("[^"]+"|[0-9]+) (OR) file([0-9]+)=("[^"]+"|[0-9]+)](../../images/3827567762.webp)
Now we can factor out the duplication by removing the last part and saying it's a repeat of the previous key/value pair.
FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+)( (AND|OR) file([0-9]+)=("[^"]+"|[0-9]+))*
![FIND (files) where file([0-9]+)=("[^"]+"|[0-9]+)( (AND|OR) file([0-9]+)=("[^"]+"|[0-9]+))*](../../images/3836939425.webp)
I've added a "*" as that last part of the expression could be repeated as many times as needed, or not be there at all.
Now... If we want to handle the value being anything, float, time, etc. we either need to include matches for each, or a general "anything" matcher. Both have downsides. If we match all types explicitly, we have more work to do. If we don't then we need to make some assumptions about "how do we know when the value is finished?"
Say we assume there will be white space after the value. Then we can match all characters until we hit whitespace... [^\s]+
FIND (files) where file([0-9]+)=([^\s]+)( (AND|OR) file([0-9]+)=([^\s]+))*
![FIND (files) where file([0-9]+)=([^\s]+)( (AND|OR) file([0-9]+)=([^\s]+))*](../../images/3819244681.webp)
But now.. if the value is a string, and it contains whitespace it breaks.
We probably want to handle strings separately to fix this.
FIND (files) where file([0-9]+)=("[^"]+"|[^\s]+)( (AND|OR) file([0-9]+)=("[^"]+"|[^\s]+))*
![FIND (files) where file([0-9]+)=("[^"]+"|[^\s]+)( (AND|OR) file([0-9]+)=("[^"]+"|[^\s]+))*](../../images/3843886275.webp)
"[^"]+"
doesn't handle escaped characters within your strings. A better matcher is "(\\"|[^"])+"
which says quote, then either escaped quote or non-quote repeatedly, then quote. Using this would add a new capture group to your expression. we don't need that, so we can tell it not to capture this group by adding a ?:
inside the brackets. eg "(?:\\"|[^"])+"
FIND (files) where file([0-9]+)=("(?:\\"|[^"])+"|[^\s]+)( (AND|OR) file([0-9]+)=("(?:\\"|[^"])+"|[^\s]+))*
![FIND (files) where file([0-9]+)=("(?:\"|[^"])+"|[^\s]+)( (AND|OR) file([0-9]+)=("(?:\"|[^"])+"|[^\s]+))*](../../images/3832286382.webp)
As I mentioned.. in C# you can name capture groups. You do this by adding a ?<name>
inside the group.
FIND (?<table>files) where file(?<key>[0-9]+)=(?<value>"(?:\\"|[^"])+"|[^\s]+)( (?<operator>AND|OR) file(?<key>[0-9]+)=(?<value>"(?:\\"|[^"])+"|[^\s]+))*
There is still duplication in this expression.. but if we took it out, we would be allowing invalid expressions to match. eg.
FIND (?<table>files)( (?<operator>AND|OR|where) file(?<key>[0-9]+)=(?<value>"(?:\\"|[^"])+"|[^\s]+))+
This would allow FIND files AND file2="test"
to match.. which isn't really want you want, but may be good enough.
I would probably just use string concat to remove the duplication,
var pair = @"(?<pair>file(?<key>[0-9]+)=(?<value>"(?:\\\"|[^\"])+\"|[^\s]+))";
var query = @"FIND (?<table>files) where "+pair+"( (?<operator>AND|OR) "+pair+")*";
var ex = new Regex(query);
or just put a code check the make sure the first operator is "where"
![FIND (files)( (AND|OR|where) file([0-9]+)=("(?:\\"|[^\"])+\"|[^\s]+))+](../../images/3833531537.webp)
var query = @"FIND (?<table>files)(?<condition> (?<operator>AND|OR|where) file(?<key>[0-9]+)=(?<value>"(?:\\\"|[^\"])+\"|[^\s]+))+";
var ex = new Regex(query);
var match = ex.Match(...);
... match.Groups["table"].Value ...
You can now match a string, loop though the "condition" groups and ask them for their operator,
key, and
value`.
see How do I access named capturing groups in a .NET Regex?