2

In the web application I'm part of a team of developers working on, we use Entity Framework as an ORM. When we perform SQL queries we typically chain link the IQueryable methods Entity Framework offers, beginning with a method Items(), which is our project specific model projection of the DbSets. I've been trying to write a Regex pattern that will find queries that doesn't use a Select method.

This is what our code typically looks like

Cars.Items()
  .Where(x => x.Year == "1988")
  .Select(x => new { x.Registration })
  .ToList();

Cars.Items()
  .Where(x => x.Id == 1923984)
  .Select(x => new { x.Registration })
  .FirstOrDefault;

This is the kind of query I'd like to find

Cars.Items()
  .Where(x => x.Id == 1923984)
  .FirstOrDefault;

I've tried using a negative lookahead to exclude queries that have a Select() method, but they are included, and I'm struggling to think of an alternative approach.

\.Items\(.*\)(\s*\..*\(.*\))*(?!\.Select\(.+\))(\s*\..*\(.*\))\;

I'll break down my logic

  • \.Items\(.*\) All queries begin with this method
  • (\s*\..*\(.*\))* Any number of chain linked IQueryable methods
  • (?!\.Select\(.+\)) Should exclude a Select() method
  • (\s*\..*\(.*\))\ This could be a First(), FirstOrDefault(), Single() or similar
  • ; Ending the query
Expotr
  • 49
  • 6
  • This is too tough a job for a regex. Heck, this isn't trivial even with a fully working C# parser! – Sergey Kalinichenko Jan 30 '17 at 14:55
  • 1
    Now that @dasblinkenlight mentions it. Might be worth looking into the Roslyn compiler API to parse C# and look for these kind of code structures automatically (within Visual Studio). – Measurity Jan 30 '17 at 14:57
  • Do you think what I've tried with the negative lookahead is possible? The pattern I'm looking for doesn't have to be sophisticated. – Expotr Jan 30 '17 at 15:00
  • Try this `\.Items\(.*?\)(\W.*(?!\.Select\(.+\)))(\W.*)\;` – vendettamit Jan 30 '17 at 15:15

2 Answers2

3

The reason the negative look ahead isn't working because you have capture everything already with this part:

\.Items\(.*\)(\s*\..*\(.*\))*

You need to first look ahead negative and then include everything. Try this regex:

\.Items\(.*?\)(\W.*(?!\.Select\(.+\)))?(\W.*)?\;

Try it out here: https://regex101.com/r/vu9mQ0/2

vendettamit
  • 14,315
  • 2
  • 32
  • 54
3

I've also tried my hand at creating a regex for this. You can test it here: https://regex101.com/r/qRU2sP/5

Regex: Cars(?>\.(?!select)\w+(?:\([^)]*\))?\s*)+\;

The small difference from the answer vendettamit gave is that this one captures the whole statement.

Edit: So how does it work?

I try to capture repeating parts (the .Items() or .Select()) so that the pattern is easier to maintain/follow.

Cars
I've put Cars at the start because this way the regex engine knows asap if the next set of characters is interesting or not. This decreases the amount of steps needed. Could also be replaced by \w+ but then more steps would be made because more characters in the whole text match that.

(?>
I start an atomic group here. This means that the regex engine won't backtrack while following the regex pattern. This improves performance a lot because if a part of the pattern fails it won't try other possibilities. More detail here.

\.
Match the . character literally.

(?!select)
Negative look-ahead - only match if not select. I've used the case-insensitive modifier on regex101.com but C# also supports this. Otherwise just change it to [Ss]elect.

\w+
Match any word character a-zA-Z0-9_ at least once. This matches the where or select part of the text.

(?:
I start a new non-capturing group here. I do this because the next piece of the pattern is optional (the () part of the methods/lambdas).

\(
This just matches the ( character literally.

[^)]*
This matches everything (including nothing. See the *) UNTIL the ) character is encountered.

\)
Literally matches the ) character.

)?
I close the optional (see ?) group. Remember, this is the part of the () of each method/lambda but it might also be a property so the () part must be optional.

\s*
Matches any whitespace character (like space, tab and new-lines) as many times as possible.

)+
Here we end the group that matches the .Where(x => x) parts of the text (at least once. see +).

\;
Matches the ; character literally.

That's it! So the biggest cheats on the performance are the Cars part and the (?> atomic group.

Community
  • 1
  • 1
Measurity
  • 1,316
  • 1
  • 12
  • 24