2

Lets say I have the following regex:

Console.writeline.+(?!;)

I want to find any line that contains "console.writeline" followed by any character one or more times but does not end with a semicolon.

When I test this regex with the following string:

Console.WriteLine("Final total count of missing VMs: {0}", missingVms.Count);

It matches. However that string ends in a semicolon so shouldn't it not match?

I realize I could use [^;] but I was more curious as to why looking for a semicolon doesn't seem to work with negative lookaheads in .NET

Edit: To clarify:

Lets say I am using Visual Studio's Find and Replace tool and I want to find and comment out every instance of Console.WriteLine(...). However, I can wind up with situations where Console.WriteLine(...) goes across multiple lines like so:

Console.WriteLine("Adding drive to VM with ID: {0}. Drive HostVMID is {1}",
     vm.ID, drive.HostVmId);

These can go on for 2, 3, 4, etc lines and finally end with ); to close the statement. Then I can have other lines that are immediately followed by important blocks of code:

Console.WriteLine("Creating snapshot for VM: {0} {1}", dbVm.ID, dbVm.VmName);
dbContext.Add(new RTVirtualMachineSnapshot(dbVm));

So what I want to do is come up with a regex statement that will find both the first type of instances of Console.WriteLine as well as simple single-line instances of it.

The Regex that I got from one of the answers to this question was

Console\.writeline(?>.+)(?<!;)

Which will match any line that contains Console.WriteLine but does not end with a semicolon. However I need it to continue on until it finally does reach a closing parenthesis followed by a semicolon.

Ive tried the following regex:

(Console\.writeline(?>.+)(?<!\);)

However I think thats incorrect because it still only matches the first line and doesnt capture the following lines when the writeline spans multiple lines.

At the end of the day I want to be able to capture a full Console.writeline statement regardless of how many lines it spans using Visual Studio's find and replace feature and I am a little confused on the regex I would need to use to do this.

user2357446
  • 656
  • 6
  • 25
  • By default, `.` matches all characters except certain link break characters. You can change the meaning of the `.` by specifying the single-line flag, which will force the `.` to match *all* characters, including newline. – JDB Dec 09 '15 at 19:30
  • @JDB How can I use that with VS's find and replace to do what I am trying to do (as I explained in my edit)? I am trying to find every instance of "Console.writeline(...);" even if it spans multiple lines. I want to then take that whole match and use the group to replace it with the same text, only commented out (two slashes in front). – user2357446 Dec 09 '15 at 20:39
  • 1
    You should have mentioned the type of task at the very start. You need a bit more specific regex. A dot cannot be used to span across lines in Find and replace tool in VS. – Wiktor Stribiżew Dec 09 '15 at 20:51
  • You *could* try using `(?s)` at the beginning of your regex. In .NET, this sets the single-line flag. Don't know if it'll work in VS though. See https://msdn.microsoft.com/en-us/library/x044wc7s.aspx – JDB Dec 09 '15 at 20:55
  • @JDB: No, `(?s)` does not work there. I am now checking the regex in VS. – Wiktor Stribiżew Dec 09 '15 at 20:56

1 Answers1

4

You can use the following regex with a lookbehind:

Console\.writeline(?>.+)(?<!;)

Here is a demo (note case insensitive flag)

The regex you use Console.writeline.+(?!;) checks for ; only after the last character and thus returns true and you get a whole line matched. A lookbehind (?<!;) will check for a ; before the end of line. But to make sure the line with last ; does not match, you need to use an atomic group (?>.+) to avoid backtracking. If you do not use an atomic group, backtracking will occur and the last character will get tested and the partial match will be returned.

UPDATE

To comment out Console.WriteLines in VS Find and Replace you can use

Console\.writeline\s*\([\s\S\r]+?\);(?=$|\r?\n)

or (the same, but an unroll-the-loop technique compliant, with optional whitespace to make it safer):

Console\.writeline\s*\([^)]*(?:\)(?!\s*;\s*(?:$|\r?\n))[^)]*)*\)\s*;\s*(?=$|\r?\n)

and replace with

/* $0 */

enter image description here

In VS Find and Replace, you should use [\s\S\r] to match any symbol including a newline.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I see. So simple! Thanks @stribizhev – user2357446 Dec 09 '15 at 17:57
  • Could I use this same lookbehind to check (and match) instances where Console.writeline goes into the next line? Currently the regex matches lines that contain Console.writeline that dont end in a semicolon, but what if I wanted to continue on and find the next line until I DO reach a semicolon? – user2357446 Dec 09 '15 at 18:05
  • Check [this demo](http://regexstorm.net/tester?p=(Console%5c.writeline(%3f%3e.%2b)(%3f%3c!%3b)(%3f%3a%5cr%3f%5cn%7c%24))%2b&i=Console.WriteLine(%22+VMs%3a+%7b0%7d%22%2c+missingVms.Count)%0d%0aConsole.WriteLine(%22+%7b0%7d%22%2c+missingVms.Count)%0d%0aConsole.WriteLine(%22f+missing+VMs%3a+%7b0%7d%22%2c+missingVms.Count)%3b&o=i), but I am not sure what you need. – Wiktor Stribiżew Dec 09 '15 at 18:24
  • I edited my original comment with what I am trying to do. – user2357446 Dec 09 '15 at 19:15
  • Alternative: `Console\.writeline.+(?<!;)$` – JDB Dec 09 '15 at 19:27
  • `[\s\S\r]` include everything and carriage return? And what is a `unroll-the-loop technique compliant` compliant what? –  Dec 09 '15 at 22:37
  • @sln: `[\s\S]` does not match a `\r` in VS S&R box. If you have VisualStudio, please try and let me know if it works for you. As for the unroll-the-loop technique, it is a way of making lazy patterns work more efficiently with very long texts eliminating the timeout issue. See [*Mastering Regular Expressions Powerful Techniques for Perl and Other Tools*](http://ww2.ii.uj.edu.pl/~tabor/prII09-10/perl/master.pdf) by Jeffrey E.F. Friedl, Page 162. – Wiktor Stribiżew Dec 09 '15 at 22:43
  • I have VisualStudio 2010 and I don't use their archane regex. –  Dec 09 '15 at 22:48
  • @stribizhev - Well, you know if `[\S\s]` doesn't match carriage return, which class excludes/includes it? If that were the case `\W` would not match CR's (`[\W\w]`). Then the regex world is a crappy place. Also, please quote me the page number for Friedmanl book defining the `unroll-the-loop technique compliant`, especially the _timeout issue_. –  Dec 10 '15 at 02:19
  • You can [see here](https://imgur.com/EDJ9zzK) that `[\w\W]*?` does not match newlines. No class includes a carriage return in VS S&R. When I say *compliant* here, I mean that the technique is used to write the pattern (consider it "compliant"). The timeout issue can be seen [here with a simple `/\(\[.*?\]\)/s` regex](https://regex101.com/r/qO2xU9/8). I have not found any good reference for the timeout issue, but Casimir et Hyppolyte has told me not to confuse catastrophic backtracking (caused by greedy patterns) and the timeout issue (with lazy patterns). – Wiktor Stribiżew Dec 10 '15 at 07:49
  • It is hard to search for comments :( Mariano says that `.*?` causes a [plain old *O(n)* timeout](http://stackoverflow.com/questions/33507624/catastophic-backtracking-issue-with-html/33508600#comment54801248_33508600). Casimir says [What makes a non-greedy quantifier slow is that before taking each character, the end of the pattern must be tested until it fails. There's no risk of overflow.](http://stackoverflow.com/questions/33805752/regular-expression-search-avoid-nested-results/33805935#comment55377548_33805935). I cannot find the post where Casimir told me about the timeout issue. – Wiktor Stribiżew Dec 10 '15 at 08:15