-6

i want the output with regex.replace :

input :

<h4 class=\"nikstyle_title\"><a rel=\"nofollow\" target=\"_blank\" href="http://www.sample.com">my text</a></h4>

output :

<h4 class=\"nikstyle_title\"> </h4>
MahdiAliz
  • 65
  • 1
  • 7

2 Answers2

1

You should never use regex to parse html, you need html parser. Here is an example how you can do it.

You need to add this reference in your project:

Install-Package HtmlAgilityPack

The code:

 static void Main(string[] args)
        {
            string html = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<table>
    <tr>
        <td>A!!</td>
        <td>te2</td>
        <td>2!!</td>
        <td>te43</td>
        <td></td>
        <td> !!</td>
        <td>.!!</td>
        <td>te53</td>
        <td>te2</td>
        <td>texx</td>
    </tr>
</table>

<h4 class=""nikstyle_title""><a rel=""nofollow"" target=""_blank"" href=""http://www.niksalehi.com/ccount/click.php?ref=ZDNkM0xuQmxjbk5wWVc1MkxtTnZiUT09&id=117""><span class=""text-matn-title-bold-black"">my text</span></a></h4>

</body>
</html>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            List<HtmlNode> tdNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "h4" && x.Attributes.Contains("class") && x.Attributes["class"].Value.Contains("nikstyle_title")).ToList();


            foreach (HtmlNode node in tdNodes)
            {
                node.InnerHtml = "";
            }

            string html2 = doc.DocumentNode.InnerHtml;
        }

EDIT:

For your second desire -> Remove every <a></a> tag with `href="http://www.sample.com":

    static void Main(string[] args)
        {
            string html = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<table>
    <tr>
        <td>A!!</td>
        <td>te2</td>
        <td>2!!</td>
        <td>te43</td>
        <td></td>
        <td> !!</td>
        <td>.!!</td>
        <td>te53</td>
        <td>te2</td>
        <td>texx</td>

    </tr>
</table>

<h4 class=""nikstyle_title""><a rel=""nofollow"" target=""_blank"" href=""http://www.sample.com""><span class=""text-matn-title-bold-black"">my text</span></a></h4>
<div><a rel=""nofollow"" target=""_blank"" href=""http://www.sample.com""><span class=""text-matn-title-bold-black"">my text</span></a></div>
</body>
</html>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            List<HtmlNode> tdNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "a" && x.Attributes.Contains("href") && x.Attributes["href"].Value.Contains("http://www.sample.com")).ToList();

            foreach (HtmlNode node in tdNodes)
            {

                node.Remove();
            }

            string html2 = doc.DocumentNode.InnerHtml;
        }

Also personally I prefer to use @ for escaping because it is more readable, you can try like in my example. When you are using @ you will escape with double quotes-example: class=""a"";

mybirthname
  • 17,949
  • 3
  • 31
  • 55
0

HtmlAgilityPack is not so universal. Sometimes only regex can save your time. In C# you can use this code:

string htmlString = "";
        var regex = new Regex("<h4 class=\\\"nikstyle_title\\\">(?<delete>.*?)<\\/h4>");
        string replace = regex.Match(htmlString).Groups["delete"].Value;
        htmlString = htmlString.Replace(replace, string.Empty);

Your regex is:

<h4 class=\"nikstyle_title\">(?<delete>.*?)<\/h4>
Vladislav
  • 218
  • 1
  • 13
  • you should never use regex ! Also please tell me in which case HtmlAgilityPack will not save you ? The save is pretty easy in this case. – mybirthname Dec 25 '14 at 14:55
  • Some websites made wrong. For exmple: bla bla bla some text without tag, but i need it .... – Vladislav Dec 25 '14 at 14:59
  • please read my question again. not just h4 tag, another tag , so i dont know the tag. i want to remove anythings between – MahdiAliz Dec 25 '14 at 15:05
  • When the html is invalid you should fix the html not the method with which you are going to read it ! – mybirthname Dec 25 '14 at 15:21
  • Hehe, this should work if you are parsing your website. But what if i want to parse yahoo news for example? – Vladislav Dec 26 '14 at 08:04