RegEx.Match problems C# extracting data from website

Question

I am learning C# and i am trying to extract data from a website.

So far i have managed to get the data i need. But as it is a hyperlink that i am trying to extract, i run in to problems.

I am trying to extract the name of a person and in the source code it is written as

<td class="name"><a href="/fodbold/biografi/patrick-kristensen/">Patrick Kristensen</a>

I use this to extract

MatchCollection NameOfPlayer = Regex.Matches(html, "<td class=\"name\"><a href=\"/fodbold/biografi/patrick-kristensen/\">\\s*(.+?)\\s*</a>", RegexOptions.Singleline);

To extract every person i need to ignore the

<a href="/fodbold/biografi/patrick-kristensen/">

but how to?

Thanks!

[Use a dom parser instead !](http://stackoverflow.com/a/1732454/1519058) — Enissay, Feb 07 '16 at 14:48
Check [*How to Get element by class in HtmlAgilityPack*](http://stackoverflow.com/questions/23040482/how-to-get-element-by-class-in-htmlagilitypack). Learning when to use and when not use regex is also a good skill to learn. — Wiktor Stribiżew, Feb 07 '16 at 14:52

jdweng · Answer 1 · 2016-02-07T15:04:23.737

0

How about this

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
         static void Main(string[] args)
        {
            string input =
                "<td class=\"name\"><a href=\"a\">s</a>" +
                "<td class=\"name\"><a href=\"b\">t</a>" +
                "<td class=\"name\"><a href=\"c\">u</a>" +
                "<td class=\"name\"><a href=\"d\">v</a>" +
                "<td class=\"name\"><a href=\"e\">w</a>" +
                "<td class=\"name\"><a href=\"f\">x</a>" +
                "<td class=\"name\"><a href=\"g\">y</a>" +
                "<td class=\"name\"><a href=\"h\">z</a>";

             string pattern = @"href=[^>]*>(?'name'[^<]*)";
             MatchCollection matches = Regex.Matches(input, pattern);

             foreach (Match match in matches)
             {
                 string name = match.Groups["name"].Value;
                 Console.WriteLine(name);
             }
             Console.ReadLine();
        }
    }
}

edited Feb 07 '16 at 15:04

answered Feb 07 '16 at 15:00

jdweng

33,250
2
15
20

It may fetch not only names. – Wiktor Stribiżew Feb 07 '16 at 15:01
It will fetch everything in the innertext following the href attribute. I'm using the ">....<". – jdweng Feb 07 '16 at 15:11
@ChristianLarsen: Only HtmlAgilityPack will work for you safely, easily and most conveniently. – Wiktor Stribiżew Feb 07 '16 at 15:41
I assume my test code does work? Issue may have to to with data being on more than one line. May need to use regex option Single line Mode. – jdweng Feb 07 '16 at 17:42

RegEx.Match problems C# extracting data from website

1 Answers1