I can parse the document and generate an output however the output cannot be parsed into an XElement because of a p tag, everything else within the string is parsed correctly.
My input:
var input = "<p> Not sure why is is null for some wierd reason!<br><br>I have implemented the auto save feature, but does it really work after 100s?<br></p> <p> <i>Autosave?? </i> </p> <p>we are talking...</p><p></p><hr><p><br class=\"GENTICS_ephemera\"></p>";
My code:
public static XElement CleanupHtml(string input)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionOutputAsXml = true;
//htmlDoc.OptionWriteEmptyNodes = true;
//htmlDoc.OptionAutoCloseOnEnd = true;
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(input);
// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
}
else
{
if (htmlDoc.DocumentNode != null)
{
var ndoc = new HtmlDocument(); // HTML doc instance
HtmlNode p = ndoc.CreateElement("body");
p.InnerHtml = htmlDoc.DocumentNode.InnerHtml;
var result = p.OuterHtml.Replace("<br>", "<br/>");
result = result.Replace("<br class=\"special_class\">", "<br/>");
result = result.Replace("<hr>", "<hr/>");
return XElement.Parse(result, LoadOptions.PreserveWhitespace);
}
}
return new XElement("body");
}
My output:
<body>
<p> Not sure why is is null for some wierd reason chappy!
<br/>
<br/>I have implemented the auto save feature, but does it really work after 100s?
<br/>
</p>
<p>
<i>Autosave?? </i>
</p>
<p>we are talking...</p>
**<p>**
<hr/>
<p>
<br/>
</p>
</body>
The bold p tag is the one that did not output correctly... Is there a way around this? Am I doing something wrong with the code?