Extract heading text from HTML text

Question

I have a textarea with tinyMCE text editor to make it RichTextEditor. I want to extract all heading(H1,H2 etc) text without style and formatting .
Suppose that txtEditor.InnerText gives me value like below:

<p><span style="font-family: comic sans ms,sans-serif; color: #993366; font-size: large; background-color: #33cccc;">This is before heading one</span></p>
<h1><span style="font-family: comic sans ms,sans-serif; color: #993366;">Hello This is Headone</span></h1>
<p>this is before heading2</p>
<h2>This is heading2</h2>

i want to get a list of heading tag's text only ? any kind of suggestion and guidance will be appreciated.

Antonio Bakula · Accepted Answer · 2016-11-19T11:39:23.840

3

Use HtmlAgilityPack, and then it's easy :

  var doc = new HtmlDocument();
  doc.LoadHtml(txtEditor.InnerText);
  var h1Elements = doc.DocumentNode.Descendants("h1").Select(nd => nd.InnerText);
  string h1Text = string.Join(" ", h1Elements);

edited Nov 19 '16 at 11:39

answered Jan 09 '13 at 14:26

Antonio Bakula

20,445
6
75
102

I am using it in web application, Asp.net. i am not finding HtmlDocument class – Mohammad Arshad Alam Jan 09 '13 at 14:46
HtmlAgilityPack is open source lib that is not included in standard libraries, download it, link is in answer, or better use NuGet – Antonio Bakula Jan 09 '13 at 14:47
isnt is possible without dll? – Mohammad Arshad Alam Jan 09 '13 at 15:06
No it's not possible to use HtmlAgilityPack without it's binaries – Antonio Bakula Jan 09 '13 at 15:16

score 0 · Answer 2 · edited May 23 '17 at 12:24

0

referencing Regular Expression to Read Tags in HTML
I believe that this is close to what you are looking for:

String h1Regex = "<h[1-5][^>]*?>(?<TagText>.*?)</h[1-5]>";

MatchCollection mc = Regex.Matches(html, h1Regex);

edited May 23 '17 at 12:24

Community

1
1

answered Jan 09 '13 at 14:29

Chris Ayers

49
6

Don't parse html with regular expressions... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Matt Jan 09 '13 at 14:32
Nice link. I'm looking into the HtmlAgilityPack now, looks interesting. – Chris Ayers Jan 09 '13 at 14:43

Extract heading text from HTML text

2 Answers2