1

I have a textarea with tinyMCE text editor to make it RichTextEditor. I want to extract all heading(H1,H2 etc) text without style and formatting .
Suppose that txtEditor.InnerText gives me value like below:

<p><span style="font-family: comic sans ms,sans-serif; color: #993366; font-size: large; background-color: #33cccc;">This is before heading one</span></p>
<h1><span style="font-family: comic sans ms,sans-serif; color: #993366;">Hello This is Headone</span></h1>
<p>this is before heading2</p>
<h2>This is heading2</h2>

i want to get a list of heading tag's text only ? any kind of suggestion and guidance will be appreciated.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Mohammad Arshad Alam
  • 9,694
  • 6
  • 38
  • 61

2 Answers2

3

Use HtmlAgilityPack, and then it's easy :

  var doc = new HtmlDocument();
  doc.LoadHtml(txtEditor.InnerText);
  var h1Elements = doc.DocumentNode.Descendants("h1").Select(nd => nd.InnerText);
  string h1Text = string.Join(" ", h1Elements);
Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102
0

referencing Regular Expression to Read Tags in HTML
I believe that this is close to what you are looking for:

String h1Regex = "<h[1-5][^>]*?>(?<TagText>.*?)</h[1-5]>";

MatchCollection mc = Regex.Matches(html, h1Regex);
Community
  • 1
  • 1
  • Don't parse html with regular expressions... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Matt Jan 09 '13 at 14:32
  • Nice link. I'm looking into the HtmlAgilityPack now, looks interesting. – Chris Ayers Jan 09 '13 at 14:43