3

Hi guys I'm trying to take a description which has been entered in a wysiwyg editor and take a substring of it..

i.e

This is some <span style="font-weight:bold;">text</span>

I'd like to limit some descriptions without breaking the html if i just substring and add ...

it breaks the html tags..

I've tried:

string HtmlSubstring(string html, int maxlength)
    {
        string htmltag = "</?\\w+((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>";
        string emptytags = "<(\\w+)((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?></\\1>";

        var expression = new Regex(string.Format("({0})|(.?)", htmltag));
        MatchCollection matches = expression.Matches(html);
        int i = 0;

        StringBuilder content = new StringBuilder();
        foreach (Match match in matches)
        {
            if (match.Value.Length == 1 && i < maxlength)
            {
                content.Append(match.Value);
                i++;
            }
            else if (match.Value.Length > 1)
            {
                content.Append(match.Value);
            }
        }
        return Regex.Replace(content.ToString(), emptytags, string.Empty);
    }

but it doesn't quite get me there!

TheLearningDev
  • 365
  • 1
  • 6
  • 19
  • Can you guarantee that the input is HTML encoded? Meaning, if a user types a `>` will it already be translated to `>`? – Yuck May 25 '11 at 02:34
  • 1
    See this question for how to do html regex: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – JK. May 25 '11 at 02:43

1 Answers1

2

Use the HTML Agility Pack to load the HTML and then get InnerText.

var document = new HtmlDocument();
document.LoadHtml("...");
document.DocumentNode.InnerText;

Also see C#: HtmlAgilityPack extract inner text

wp78de
  • 18,207
  • 7
  • 43
  • 71
Richard Schneider
  • 34,944
  • 9
  • 57
  • 73