I have a string containing something like this :
string text = "<p>test <span> <font> here </font> </span> try</p><p> <font> try 2</font> </p>"
What I need is to filter something like this :
Keep Text inside P Remove Span and content (font and text) Keep Text inside font if its direct parent is not a Span*
What I have is :
StringBuilder sbtexttoCorrect = new StringBuilder();
HtmlDocument html = new HtmlDocument();
html.LoadHtml(textToFormat);
var nodes = html.DocumentNode.SelectNodes("//p");
foreach (var line in nodes)
{
if (line.Name =="SPAN")
{
line.RemoveAllChildren();
line.Remove();
}
}
foreach (var txt in nodes)
{
sbtexttoCorrect.Append(txt.InnerText);
}
But the sbtexttoCorrect at then end still gets the child font of the span. Even with the Removechild and his own Remove.
What am I missing?
Note : on another post someone told me :
foreach (var line in nodes.Select(node => node.ChildNodes.Where(
childNode => childNode.Name != "span"))
.Select(
textNodes => textNodes.Aggregate(String.Empty, (current, node) => current + node.InnerText)))
{
sbtexttoCorrect.Append(line);
}
But I do not understand all of the syntax so I wanted to rewrite my own try, plus it did not work all the time too, it is still getting the text inside the Font inside the Span.
Note 2 I can't find any doc on the specification of the Agilty Pack. If someone knows where to find it, I'd like to learn more about this library.
Edit The real HTML is way more complexe, with a number of childNode that I can't know for sur, they can be TD or DIV, the only thing really sure is when there is a span I need to skip his content and his childNode