5

I am providing a textbox for one to enter a Regular Expression to match filenames. I plan to detect any named capture groups that they provide with the Regex method GetGroupNames().

I want to get the expression that they entered inside each named capture group.

As an example, they might enter a regular expression like this:

December (?<FileYear>\d{4}) Records\.xlsx

Is there a method or means to get the sub-expression \d{4} apart from manually parsing the regular expression string?

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
John Kurtz
  • 775
  • 8
  • 16

3 Answers3

1

Here is an ugly brute force extension for parsing without using another Regex to detect the subexpression (or subpattern):

    public static string GetSubExpression(this Regex pRegex, string pCaptureName)
    {
        string sRegex = pRegex.ToString();
        string sGroupText = @"(?<" + pCaptureName + ">";
        int iStartSearchAt = sRegex.IndexOf(sGroupText) + sGroupText.Length;
        string sRemainder = sRegex.Substring(iStartSearchAt);
        string sThis;
        string sPrev = "";
        int iOpenParenCount = 0;
        int iEnd = 0;
        for (int i = 0; i < sRemainder.Length; i++)
        {
            sThis = sRemainder.Substring(i, 1);
            if (sThis == ")" && sPrev != @"\" && iOpenParenCount == 0)
            {
                iEnd = i;
                break;
            }
            else if (sThis == ")" && sPrev != @"\")
            {
                iOpenParenCount--;
            }
            else if (sThis == "(" && sPrev != @"\")
            {
                iOpenParenCount++;
            }
            sPrev = sThis;
        }
        return sRemainder.Substring(0, iEnd);
    }

The usage looks like this:

    Regex reFromUser = new Regex(txtFromUser.Text);
    string[] asGroupNames = reFromUser.GetGroupNames();
    int iItsInt;
    foreach (string sGroupName in asGroupNames)
    {
        if (!Int32.TryParse(sGroupName, out iItsInt)) //don't want numbered groups
        {
            string sSubExpression = reParts.GetSubExpression(sGroupName);
            //Do what I need to do with the sub-expression
        }
    }

Now, if you would like to generate test or sample data, you can use the NuGet package called "Fare" in the following way after you get a sub-expression:

            //Generate test data for it
            Fare.Xeger X = new Fare.Xeger(sSubExpression);
            string sSample = X.Generate();
John Kurtz
  • 775
  • 8
  • 16
0

Here is a solution using a regular expression to match the capturing groups in a regular expression. Idea is from this post Using RegEx to balance match parenthesis:

\(\?\<(?<MyGroupName>\w+)\>
(?<MyExpression>
((?<BR>\()|(?<-BR>\))|[^()]*)+
)
\)

or more concisely...

\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)

and to use it might look like this:

string sGetCaptures = @"\(\?\<(?<MyGroupName>\w+)\>(?<MyExpression>((?<BR>\()|(?<-BR>\))|[^()]*)+)\)";
MatchCollection MC = Regex.Matches(txtFromUser.Text, sGetCaptures );
foreach (Match M in MC)
{
    string sGroupName = M.Groups["MyGroupName"].Value;
    string sSubExpression = M.Groups["MyExpression"].Value;
    //Do what I need to do with the sub-expression
    MessageBox.Show(sGroupName + ":" + sSubExpression);
}

And for the example in the original question, the message box would return FileYear:\d{4}

Community
  • 1
  • 1
John Kurtz
  • 775
  • 8
  • 16
0

This pattern (?<=\(\?<\w+\>)([^)]+) will give you all the named match capture expression with the name of the capture. It uses a negative look behind to make sure the text matched will have a (?<...> before it.


string data = @"December (?<FileYear>\d{4}) Records\.xlsx";
string pattern = @"(?<=\(\?<\w+\>)([^)]+)";

Regex.Matches(data, pattern)
     .OfType<Match>()
     .Select(mt => mt.Groups[0].Value)

returns one item of

\d{4}

While the data such as (?<FileMonth>[^\s]+)\s+(?<FileYear>\d{4}) Records\.xlsx would return two matches:

[^\s]+

\d{4}

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
  • This did not handle "(?\d{2}(\d)(\d))", only returning "\d{2}(\d". – John Kurtz Apr 19 '17 at 14:50
  • @JohnKurtz you did not provide that as a scenario/example. Please add all data example scenarios. – ΩmegaMan Apr 19 '17 at 15:21
  • Ah, you are correct, your answer meets the question. The handling of extra parenthesis was discussed in the other answers. – John Kurtz Apr 19 '17 at 20:56
  • 1
    @JohnKurtz If my understanding of what you need to do is correct, you may find you will have too many scenarios to be handled by just one regex, but need a parser. – ΩmegaMan Apr 19 '17 at 21:11