Manipulate HTML from ASP.NET Before Importing to Excel

Question

I'm exporting using an HTMLTable in ASP.NET using Response.Write to write all the HTML Table Code to an Excel file.

What I need to do now, is to remove all the Hyperlinks from this Excel file. Is there a better way to do this than to use a Regex?

If a Regex is the best way, how can I just eliminate the tags and not the ID in between?

    <td class="header">Details ID</td>
      <td>
         <div class="id"><a class="details" href="details?id=1232" target="_blank">1232</a></div>
      </td>
      <td>
         <div class="id"><a class="details" href="details?id=1233" target="_blank">1233</a></div>
      </td>
      <td>
         <div class="id"><a class="details" href="details?id=1234" target="_blank">1234</a></div>
      </td>
   </tr>

Don't use RegEx with HTML: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — John Saunders, Aug 23 '12 at 23:28

Leniel Maccaferri · Accepted Answer · 2012-08-24T01:28:03.840

0

This simple regex will do it:

</?(a|A).*?>

Here's a link to test it with your input:

http://regexhero.net/tester/?id=c1458e14-de87-4f57-9850-3ee00e573566

enter image description here

If you don't like parsing HTML with Regex as John Saunders, you can use HtmlAgilityPack:

class Program
{
    static void Main(string[] args)
    {
        RemoveHyperlinksButKeepText();
    }

    private static void RemoveHyperlinksButKeepText()
    {
        var htmlDoc = new HtmlDocument();
        htmlDoc.Load(@"C:\YourHtmlFile.html");

        var links = htmlDoc.DocumentNode.SelectNodes("//a");

        string html = htmlDoc.DocumentNode.OuterHtml;

        foreach (HtmlNode link in links)
        {
            var linkText = link.InnerText;

            html = html.Replace(link.OuterHtml, linkText);
        }

    }
}

edited Aug 24 '12 at 01:28

answered Aug 23 '12 at 23:07

Leniel Maccaferri

100,159
46
371
480

-1: See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Saunders Aug 23 '12 at 23:27
@JohnSaunders: What do you think about HtmlAgilityPack? It's a good option in this situation... – Leniel Maccaferri Aug 24 '12 at 00:28
I've never used it, but many people say many good things about it. Many people say many bad things about using regular expressions. – John Saunders Aug 24 '12 at 00:53
1

+1 for providing both solutions. Html Agility Pack is what I'd use for this problem. It's much better suited to this sort of thing. – Steve Wortham Aug 29 '12 at 20:52

Manipulate HTML from ASP.NET Before Importing to Excel

1 Answers1