-2

I have the following: (?<=>)(\w*)(?=<)

Which grabs all the occurrences of the words inside of the two angle brackets > and <.

However I only want to grab the third occurrence. I have tried several iterations of {2} inside the above regex (as the third) but so far nothing has worked. Any ideas will help.

Matt.G
  • 3,586
  • 2
  • 10
  • 23
Fayaz
  • 79
  • 1
  • 10
  • 3
    Parsing HTML using regex, [he comes...](https://stackoverflow.com/a/1732454/542251) – Liam May 14 '18 at 13:50
  • *Any ideas will help* dont' try and parse HTML using regex, that way madness leads. Use something more like the HTML agility pack, etc. – Liam May 14 '18 at 13:52
  • Possible duplicate of [What is the best way to parse html in C#?](https://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c) – Liam May 14 '18 at 13:53
  • (?<=>)(\w+)(?=<) this is better but still cannot capture 3rd occurrence – Fayaz May 14 '18 at 14:12
  • Hi Liam. Not impossible, and I have done it many times before. Thanks for your comment/suggestion now lets see if I can find the answer to a simple question I am asking. – Fayaz May 14 '18 at 14:18

1 Answers1

0

If you use zero-width assertions such as a look ahead or look behind then you'll never skip n number of items. You'll have to "eat" the characters so that processing will not include characters used for the previous match.

The expression below should do the trick. It matches two instances of a close bracket, not close bracket, close bracket before it matches the third close bracket, not close bracket, close bracket. To get the content inside the brackets, use Match.Groups and GroupCollection.Captures.

var re = new Regex(@"(?:(?:>[^<]+<.*?){2}>)([^<]+)(?:)");
var matches = re.Matches(@"<a>1</a><b>2</b><a>3</a><b>4</b><e>5</e><a>6</a>");
foreach (Match match in matches) {
    Console.WriteLine(match.Groups[1].Captures[0]);
}
// output:
// 3
// 6

And yes, this will be a very bad html parser, but fine for a quick and dirty hack.

Liam
  • 27,717
  • 28
  • 128
  • 190
Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70