1

I found the following code to find the n-th occurrence of a value in a text here.

This is the code:

public static int NthIndexOf(this string target, string value, int n)
{
    Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");

    if (m.Success)
        return m.Groups[2].Captures[n - 1].Index;
    else
        return -1;
}

I tried to find the index of the second occurrence of "< /form>" (the space does not appear in the original string) in some webpage, and it failed, although for sure it exists in the text. I also cut some prefix of the webpage, so the second occurrence will be the first, and then I succeeded to find the expression as the first occurrence.

In one of the comment on this code, someone wrote that "This Regex does not work if the target string contains linebreaks.".

My two questions are:

  1. Why does not this code work if the target string contains linebreaks?

  2. How can I fix this code, so it will work also for strings that contain linebreaks (replacing/removing the linebreaks is not considered a good solution for me)?

I don't look for other techniques to do the same thing.

Community
  • 1
  • 1
Gari BN
  • 1,635
  • 2
  • 17
  • 31
  • Can you give an example of an input and output? – gunr2171 Sep 22 '14 at 17:43
  • If you're trying to read data out of HTML you might want to consider using the [Html Agility Pack](http://htmlagilitypack.codeplex.com/) instead of [regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Conrad Frix Sep 22 '14 at 17:49

2 Answers2

1

the regex match till the end of the line.

For what you want you need to use the Singleline mode, so your code should look something like this:

 Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.Singleline);
Mzf
  • 5,210
  • 2
  • 24
  • 37
  • 1
    I have always used multiline on Reg Ex and looking at the link you provided it appears it is two ways to do the same thing. Is there any reason you suggest single line instead of multiline? –  Sep 22 '14 at 17:56
  • 1
    for you to choose . note that in MultiLine ^ and $ match the beginning and end of each line. so you will need to match it to what fit for you – Mzf Sep 22 '14 at 18:01
1

By default Regular Expression end on a new line. To fix it you need to specify the regex option

Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.MultiLine);

You can find more information about RegExOptions here.