I use HtmlAgilityPack version 1.11.22 to sanitize user content. In production, we have some threads that are consuming many CPU time for many hours. It's like HtmlAgilityPack is on a infinity loop.
I dumped my application and I found one thread running for 2 hours and 28 minutes. The call stack shows HtmlAgilityPack calls :
mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],
[System.__Canon, mscorlib]].Insert(System.__Canon, System.__Canon, Boolean)+bb
HtmlAgilityPack.HtmlDocument.SetIdForNode(HtmlAgilityPack.HtmlNode, System.String)+58
HtmlAgilityPack.HtmlNode.AppendChild(HtmlAgilityPack.HtmlNode)+43
HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32, Boolean)+62
HtmlAgilityPack.HtmlDocument.Parse()+1007
HtmlAgilityPack.HtmlDocument.Load(System.IO.TextReader)+197
HtmlAgilityPack.HtmlDocument.LoadHtml(System.String)+49
Habitus.Toolkit.Api.HtmlSanitizerConverter.Sanitize(System.Object)+b8
How you can see, the last call of HtmlAgilityPack is SetIdForNode() (after un LoadHtml()).
We sanitize all strings coming from the user (not-html strings also).
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(userContent);
string output = string.Empty;
foreach (var node in HtmlDocument.DocumentNode.ChildNodes)
{
output += node.InnerText;
}
return output;
What's wrong and why HtmlAgilityPack is consuming all my cpu time ?