0

WRT this solution, pleas how can we adapt it to retain tabs and other valid plain-text layout

Referenced solution:

 public static string StripHTML(string HTMLText, bool decode = true)
        {
            Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
            var stripped = reg.Replace(HTMLText, "");
            return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
        }
Community
  • 1
  • 1
Charles Okwuagwu
  • 10,538
  • 16
  • 87
  • 157

1 Answers1

1

I'm not sure what you mean, it does preserve tabs and newlines

void Main()
{
    var html = "<html>\n\t<body>\n\t\tBody text!\n\t</body>\n</html>";

    StripHTML(html).Dump(); //Prints "\n\t\n\t\tBody text!\n\t\n
}

public static string StripHTML(string HTMLText, bool decode = true)
{
  Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
  var stripped = reg.Replace(HTMLText, "");
    return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
}
mdickin
  • 2,365
  • 21
  • 27