1

I have a bunch of C# source files, which I need to analyze outsize of VS, and am struggling with a particular case:

    public static bool InsertNote(string TableName, string TableKey, string DocType, string InsuredKey, string SubmissionKey,
                                    string Staff, string DefaultAction, string DisplayKey, FileType FileType, string IFSFileName,
                                    string IFSFolder, string IFSTimeStamp, string Subject, string Notation, NoteType NoteType,
                                    //string Company, string NoteCategory, ref OracleConnection Connection)
                                    string Company, string NoteCategory, string DocumentName, ref SqlConnection Connection)
    {

I've thought that this RegEx should be able to find it:

    private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Multiline | RegexOptions.Compiled);

But it does not. What am I missing?

Darek
  • 4,687
  • 31
  • 47

3 Answers3

2

. by default does not match newlines. You might solve the problem with RegexOptions.Singleline:

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.Compiled);

The Multiline option makes ^ and $ match at every beginning and end of a line respectively instead of matching the beginning and end of the whole string. It might be a little confusing, but that's how it is! And you can use an inline modifier which works just the same (?s). I'll use that in the subsequent regexes, and remove the Multiline mode since it's not being used.

But that's not the only problem. .* will not match greedily, meaning that it will match as much as possible, before \S* even has the chance to match something. You can fix this by making .* lazy, i.e. by adding a ? to it, or simply removing it, since it isn't doing much anyway. Also {1} is redundant, since repetition of once is the default quantifier. Also, the ^.* at the beginning isn't doing much You can safely remove it:

private static readonly Regex MethodNamesExtractor = new Regex(@"(?s)(\S*)\(.*ref\s*SqlConnection", RegexOptions.Compiled);

Now for the tricky part: if you are now trying to match several method names from many methods, the above regex will match only one. Let's say you are trying to get the method names from two methods, the first one doesn't have the req SqlConnection part while the second one does. Well, you get this.

To fix that, you might want to restrict .* to a negated class, by using [^)]*. You will notice that using this won't give you any match, and that's because of a commented part in the method which has a ) just before the req SqlConnection part appears. Well, you can allow for commented lines like this:

"(?s)(\S*)\((?:[^)]|//[^\r\n]*\))*ref\s*SqlConnection"

That's provided you don't have any 'false' double forward slashes or parens within the parameters. To allow comment blocks too, well, the regex will become longer, obviously... (and even longer if you want to allow parens within the parameters)

"(?s)(\S*)\((?:[^)]|//[^\r\n]*\)|/\*(?:(?!\*/).)*\*/)*ref\s*SqlConnection"

Well, conclusion, it might be better to use a dedicated parser to parse a programming language.

Community
  • 1
  • 1
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • 1
    I will try your recommendation for RegEx, thanks for quite elaborate answer. As for the parser ... All I am trying to do is to find in how many places the developers f-ed up, by passing an SqlConnection by reference, in a multi-threaded scenario. – Darek May 07 '14 at 16:42
  • @Darek Ah, I see. Good luck with that task, cleaning up behind those who messed up is never fun... Sorry it's a bit of a wall of text. Sometimes, my answers come out like that ^^; – Jerry May 07 '14 at 16:45
0

I think if you add RegexOptions.Singleline, it'll do what you want. Here it is on regex101.com

So try the following (translating on the fly from regex101 style definition.:

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Singleline | RegexOptions.Compiled);

Reason: Multiline has to do with how ^ & are interpreted. Singleline on the other hand says that . matches newline which is what you want since your test text is across multiple lines.

LB2
  • 4,802
  • 19
  • 35
  • It finds one group and it is empty. – Darek May 07 '14 at 16:10
  • @Darek Of course... your group is `(\S*)` which is defined as _"match any non-white space character [^\r\n\t\f ]"_ (quoting regex101). What do you expect to capture in your group? – LB2 May 07 '14 at 16:14
0

You need to add a question mark after the star

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*?(\S*)\({1}.*?ref\s*SqlConnection", RegexOptions.Singleline | RegexOptions.Compiled);

Otherwise star will act as a greedy quantifier.

http://regex101.com/r/qG5lD3

Logan Murphy
  • 6,120
  • 3
  • 24
  • 42