2

I have the following Regex using IgnoreCaseand Multilinein .NET:

^.*?\WtextToFind\W.*?$

Given a multiline input like:

1 Some Random Text textToFind
2 Some more "textToFind" random text 
3 Another textToFinddd random text

The current regular expression matches with the lines 1 and 2. However I need to skip all the lines which textToFindis inside quotes and double quotes.

Any ideas how to achieve this?

Thanks!

EDIT:

Explanation: My purpose is to find some method calls inside VBScript code. I thought this would be irrelevant for my question, but after reading the comments I realised I should explain this.

So basically I want to skip text that is between quotes or single quotes and all the text that is between a quote and the end of line since that would be a comment in VBScript: If I'm looking for myFunc

Call myFunc("parameter") // should match
Call anotherFunc("myFunc") //should not match
Call someFunc("parameter") 'Comment myFunc //should not match
If(myFunc("parameter") And someFunc("myFunc")) //should match
margabit
  • 2,924
  • 18
  • 24
  • 2
    directly inside like the example, or also `"like textToFind this"`? Also, if the above should be ignored, what about something like `"hello" textToFind "goodbye"`? – Smern Aug 23 '13 at 13:13
  • Mmmm! good one @AustinSalonen! Buff.. I'm seeing more and more requirements with your comments.. I update it – margabit Aug 23 '13 at 13:22
  • Updated my question with full explanation.. – margabit Aug 23 '13 at 13:29
  • 1
    Nice update. You should understand what the [XY Problem](http://meta.stackexchange.com/a/66378) is and why you shouldn't pose a question like you originally did. – Austin Salonen Aug 23 '13 at 13:32
  • 1
    I have read about XY problem before. I presented the problem like this since I was only thinking in one requirement and I thought that would be easier to understand. As soon as I discovered that I was missing a requiremente, I updated my question. – margabit Aug 23 '13 at 13:37
  • 1
    This may be a stupid/irrelevant question, but don't you have an IDE that can hunt down all calls to `myFunc()`? – Michelle Aug 23 '13 at 13:42
  • 1
    In ASP Classic? I Wish... :) – margabit Aug 23 '13 at 13:59

2 Answers2

2

With all of the possible cases involving mixed sets of quotes, a regex may not be your best option here. What you could do instead (after using your current regex to filter for everything but quotes), is count the number of quotes before and after the occurrence of textToFind. If both counts are odd, then you have quotes around your keyword and should scrap the line. If both are even, you've got matched quotes elsewhere (or no quotes at all), and should keep the line. Then repeat the process for double quotes. You could do all this only walking through the string once.

Edit to address the update that you're searching through code: There are some additional considerations to take into account.

  • Escaped quotes (skip over the character after an escape character, and it won't be counted).
  • Commented quotes, e.g. /* " */ in the middle of a line. When you hit a /*, just jump to the next occurrence of */ and then continue inspecting characters. You may also want to check whether the occurrence of textToFind is in a comment.
  • End-of-line ' quotes - if it occurs (outside a literal string) before the keyword, it's not a valid method call.

The bottom line is still that regexes aren't the droids you're looking for, here. You're better off walking through lines and parsing them.

Michelle
  • 2,830
  • 26
  • 33
  • Edited answer to offer a different (non-regex) solution. Didn't catch your most recent update first - there's an added issue of not counting escaped quotes as well, but you should be able to detect that fairly easily (e.g. skip the next character when you hit a backslash). – Michelle Aug 23 '13 at 13:39
  • `notthefunc(""); /* " */ myfunc("");` – JDB Aug 23 '13 at 13:40
1

It seems like this should work for your actual implementation in all the examples you've given:

/\bmyFunc\(/

Demonstration - view console.

as long as you don't have something like "i'm going to call myFunc()", but if you start trying to deal with quotes, multiple quotes, nested quotes, etc... it will get very messy (like trying to parse dom with regex).

Also, it appears that you are checking within vbscript code. Comments in vbscript code start with an ', right? You could check this as well, as it looks like you are doing this on a line by line basis, this should work for those type of comments:

/^\s*[^'].*\bmyFunc\(/

Demo

Community
  • 1
  • 1
Smern
  • 18,746
  • 21
  • 72
  • 90
  • I can see references to `myFunc()` in comments being a totally valid case. – Michelle Aug 23 '13 at 13:49
  • Another problem is that not all methods in VBScript are followed by a `(`. It's not a wrong answer since I didn't specify this in the question. – margabit Aug 23 '13 at 13:57
  • @margabit, ah, I've never programmed in VBScript and actually had to look up what comments look like. In any case, you could use a simpler regex (even just `\bmyFunc\W` for example) to filter lines that contain the function name you are looking for and then from there parse just those lines to make sure they fit your more complex criteria. – Smern Aug 23 '13 at 14:03