1

I'm looking for a good function to remove HTML from a string of HTML. Ideas?

Paul Fryer
  • 9,268
  • 14
  • 61
  • 93
  • do you want to remove or to escape HTML? – DennyRolling Nov 06 '10 at 21:27
  • Trying to remove it. I know this could result in some strange strings, but that's what I need to do with the system I'm integrating with. Thanks. – Paul Fryer Nov 06 '10 at 21:32
  • Similar question: http://stackoverflow.com/questions/787932/using-c-regular-expressions-to-remove-html-tags – eldarerathis Nov 06 '10 at 21:39
  • possible duplicate of [How to extract text from resonably sane HTML?](http://stackoverflow.com/questions/2113651/how-to-extract-text-from-resonably-sane-html) – Wim Coenen Nov 06 '10 at 21:42

2 Answers2

6

I have not extensively tested this but found it a while back and has worked for my needs:

public static string StripTags(string html) {

    System.Text.RegularExpressions.Regex objRegExp = new System.Text.RegularExpressions.Regex("<(.|\\n)+?>");
    return objRegExp.Replace(html, "");

}
Anthony Greco
  • 2,885
  • 4
  • 27
  • 39
  • It is worthwhile to note that due to the nature of HTML, it is impossible to write perfectly complete regular expressions to parse HTML. See [here](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) and [here](http://stackoverflow.com/questions/133601). – Phil Hunt Nov 06 '10 at 21:37
  • ok... converted it using [http://www.developerfusion.com/tools/convert/vb-to-csharp/]. Probably one of the best tool I use daily when google-ing code samples cause i always fine C# examples for things I need in VB.net – Anthony Greco Nov 06 '10 at 21:39
  • @Anthony Thanks for the info, even if it is not C#, I can easily convert from VB to C#. That basically worked for what I'm trying to do. – Paul Fryer Nov 06 '10 at 22:49
  • np man. Like they said reg expressions wont always work (or any solution for that matter), especially because u will commonly find times programmers forgot to close their tags / etc, but it does work for the majority of situations. Glad i was able to help. – Anthony Greco Nov 06 '10 at 23:00
2

Take a look at this c-strip-xmlhtml-from-string or Html Agility Pack

nubm
  • 1,153
  • 2
  • 14
  • 32