-1

I'm exporting using an HTMLTable in ASP.NET using Response.Write to write all the HTML Table Code to an Excel file.

What I need to do now, is to remove all the Hyperlinks from this Excel file. Is there a better way to do this than to use a Regex?

If a Regex is the best way, how can I just eliminate the tags and not the ID in between?

    <td class="header">Details ID</td>
      <td>
         <div class="id"><a class="details" href="details?id=1232" target="_blank">1232</a></div>
      </td>
      <td>
         <div class="id"><a class="details" href="details?id=1233" target="_blank">1233</a></div>
      </td>
      <td>
         <div class="id"><a class="details" href="details?id=1234" target="_blank">1234</a></div>
      </td>
   </tr>
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Garrett
  • 1,658
  • 2
  • 17
  • 29
  • 1
    Don't use RegEx with HTML: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Saunders Aug 23 '12 at 23:28

1 Answers1

0

This simple regex will do it:

</?(a|A).*?>

Here's a link to test it with your input:

http://regexhero.net/tester/?id=c1458e14-de87-4f57-9850-3ee00e573566

enter image description here

If you don't like parsing HTML with Regex as John Saunders, you can use HtmlAgilityPack:

class Program
{
    static void Main(string[] args)
    {
        RemoveHyperlinksButKeepText();
    }

    private static void RemoveHyperlinksButKeepText()
    {
        var htmlDoc = new HtmlDocument();
        htmlDoc.Load(@"C:\YourHtmlFile.html");

        var links = htmlDoc.DocumentNode.SelectNodes("//a");

        string html = htmlDoc.DocumentNode.OuterHtml;

        foreach (HtmlNode link in links)
        {
            var linkText = link.InnerText;

            html = html.Replace(link.OuterHtml, linkText);
        }

    }
}
Leniel Maccaferri
  • 100,159
  • 46
  • 371
  • 480