-2

I'm trying to extract strings after a pattern in a long string, which is basically HTML output of a page.

For example; I need to extract target of href tag from this string

<h2 class=\ "product-name\"><a href=\"/erkek-ayakkabi-spor-gri-17sfd3007141340-p\" title=\"...">...</a></h2>\r\n

What I need from this: erkek-ayakkabi-spor-gri-17sfd3007141340-p

But also I need to find strings alike to the one above. SO I need to search for href tags after class=\ "product-name\" in the HTML string.

How can I achieve this?

Ege Bayrak
  • 1,139
  • 3
  • 20
  • 49
  • See [What is the best way to parse html in C#?](http://stackoverflow.com/questions/56107). – Wiktor Stribiżew May 11 '17 at 06:59
  • I'm working on an already writen code, I just need to do a minimal change. I don't have time to fundamentally change the way we parse html now. Maybe later. – Ege Bayrak May 11 '17 at 07:01

1 Answers1

1

Please check this.

Regex:

class=\"product-name\"(.*)<a\shref=\"(.*?)\"

Updated Regex:

class=\"product-name\".*<a\shref=\"(.*?)\"

Regex101 Example.

C# Code:

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string data = "<h2 class=\"product-name\"><a href=\"erkek-ayakkabi-spor-gri-17sfd3007141340-p\" title=\"...\">...</a></h2>\r\n<h2 class=\"test-name\"><a href=\"erkek-ayakkabi-spor-gri-17sfd3007141340-p\" title=\"...\">...</a></h2>\r\n<h2 class=\"product-name\"><a href=\"erkek-ayakkabi-spor-gri-17sfd3007141340-p\" title=\"...\">...</a></h2>\r\n";
        //string regex = "class=\"product-name\"(.*)<a\\shref=\"(.*?)\"";
        string regex = "class=\"product-name\".*<a\\shref=\"(.*?)\"";
        var matches = Regex.Matches(data, regex, RegexOptions.Multiline);
        foreach(Match item in matches)
        {
            //Console.WriteLine("Value: " + item.Groups[2]);
            Console.WriteLine("Value: " + item.Groups[1]);
        }
    }
}

DotNetFiddle Example.

csharpbd
  • 3,786
  • 4
  • 23
  • 32