1

I have the following regex:

(?!SELECT|FROM|WHERE|AND|OR|AS|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)

that I'm matching against a string like this:

SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';

This already does 90% of what I want. What I also like to do is to exclude AS My_alias, where the alias can be any word.

I tried to add this to my regex, but this didn't work:

(?!SELECT|FROM|WHERE|AND|OR|AS [a-zA-Z0-9_]+|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)
                            ^^^^^^^^^^^^^^^^
                          this is the new part

How can I exclude this part of the string using my regex?

Demo of the regex can be found here

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
Vivendi
  • 20,047
  • 25
  • 121
  • 196
  • 1
    You can't use an [sql parser](https://stackoverflow.com/questions/589096/parsing-sql-code-in-c-sharp)? – gunr2171 Aug 20 '20 at 15:49
  • 1
    As for the regex, I think you could use `\b(?<!\bAS\s+)(?!(?:SELECT|FROM|WHERE|AND|OR|AS)\b)(?<=[ ^(])[A-Za-z_]\w*` – Wiktor Stribiżew Aug 20 '20 at 15:50
  • I agree with the above suggestion. Even if we give you a patch for your current regex, it is still very simple and would fail for many other types of queries. Write/use a parser here for best results. – Tim Biegeleisen Aug 20 '20 at 15:51

1 Answers1

0

This excludes the AS and gets the tokens you seek. It also handles multiple select values, along zero to many Where clauses.

The thought is to use named explicit captures, and let the regex engine know to disregard any non-named capture groups. (A match but don't capture feature)

We will also put all the "tokens" wanted into one token captures (?<Token> ... ) for all of our token needs.

var data = "SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';";

var pattern = @"
^
SELECT\s+
   (
     (?<Token>[^\s]+)
     (\sAS\s[^\s]+)?
     [\s,]+
   )+                  # One to many statements
FROM\s+
(?<Token>[^\s]+)       # Table name
(
  \s+WHERE\s+
   (
      (?<Token>[^\s]+)
      (.+?AND\s+)?
   )+                  # One to many conditions
)?                # Optional Where
";

var tokens = 
Regex.Matches(data, pattern, 
              RegexOptions.IgnorePatternWhitespace // Lets us space out/comment pattern
            | RegexOptions.ExplicitCapture)        // Only consume named groups.
     .OfType<Match>()
     .SelectMany(mt => mt.Groups["Token"].Captures // Get the captures inserted into `Token`
                                         .OfType<Capture>()
                                         .Select(cp => cp.Value.ToString()))
     ;

tokens is an array of these strings: { "Static", "Table", "Id", "Name" }


This should get you going on most of the cases of what will find. Use similar logic if you need to process selects with joins; regardless this is a good base to work from going forward.

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122