Reading so much about not using RegExes for stripping HTML, I am wondering about how to get some Links into my RichTextBox without getting all the messy html that is also in the content that i download from some newspaper site.
What i have: HTML from a newspaper website.
What i want: The article as plain text in a RichTextBox. But with links (that is, replacing the <a href="foo">bar</a>
with <Hyperlink NavigateUri="foo">bar</Hyperlink>
).
HtmlAgilityPack gives me HtmlNode.InnerText
(stripped of all HTML tags) and HtmlNode.InnerHtml
(with all tags). I can get the Url and text of the link(s) with articlenode.SelectNodes(".//a")
, but how should i know where to insert that in the plain text of HtmlNode.InnerText
?
Any hint would be appreciated.