1

I have matchCollection. And I need group index 1. Now I take out the data from a large number of casts, I would like to avoid it.

example: startTag = <a>, endTag = </a> Html = <a>texttexttext</a>.

I need get "texttexttext" with out <a> and </a>

 var regex = new Regex(startTag + "(.*?)" + endTag, RegexOptions.IgnoreCase);
 var matchCollection = regex.Matches(html);
 foreach (var item in matchCollection)
 {

      string temp = ((Match)(((Group)(item)).Captures.SyncRoot)).Groups[1].Value;
 } 
Mediator
  • 14,951
  • 35
  • 113
  • 191

2 Answers2

2

I would recommend you using Html Agility Pack to parse HTML instead of regex for various reasons.

So to apply it to your example with finding all anchor text inside an HTML document:

using System;
using System.Net;
using HtmlAgilityPack;

class Program
{
    static void Main()
    {
        string html = "";
        using (var client = new WebClient())
        {
            html = client.DownloadString("http://stackoverflow.com");
        }

        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//a"))
        {
            // Will print all text contained inside all anchors 
            // on http://stackoverflow.com
            Console.WriteLine(link.InnerText);
        }
    }
}
Community
  • 1
  • 1
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
1

You could use a capture group. You might also want to use a named group. Notice the parentheses I added to regex.

        var html = "<a>xx yyy</a>   <a>bbb cccc</a>";
        var startTag = "<a>";
        var endTag = "</a>";
        var regex = new Regex(startTag + "((.*?))" + endTag, RegexOptions.IgnoreCase);
        var matchCollection = regex.Matches(html);
        foreach (Match item in matchCollection)
        {
            var data = item.Groups[1];
            Console.WriteLine(data);
        } 

This is even a little nicer, because a named group is a little easier to grab.

        var html = "<a>xx yyy</a>   <a>bbb cccc</a>";
        var startTag = "<a>";
        var endTag = "</a>";
        var regex = new Regex(startTag + "(?<txt>(.*?))" + endTag, RegexOptions.IgnoreCase);
        var matchCollection = regex.Matches(html);
        foreach (Match item in matchCollection)
        {
            var data = item.Groups["txt"];
            Console.WriteLine(data);
        } 
ek_ny
  • 10,153
  • 6
  • 47
  • 60