0

I use HtmlAgilityPack version 1.11.22 to sanitize user content. In production, we have some threads that are consuming many CPU time for many hours. It's like HtmlAgilityPack is on a infinity loop.

I dumped my application and I found one thread running for 2 hours and 28 minutes. The call stack shows HtmlAgilityPack calls :

mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib], 
[System.__Canon, mscorlib]].Insert(System.__Canon, System.__Canon, Boolean)+bb
HtmlAgilityPack.HtmlDocument.SetIdForNode(HtmlAgilityPack.HtmlNode, System.String)+58    
HtmlAgilityPack.HtmlNode.AppendChild(HtmlAgilityPack.HtmlNode)+43    
HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32, Boolean)+62    
HtmlAgilityPack.HtmlDocument.Parse()+1007    
HtmlAgilityPack.HtmlDocument.Load(System.IO.TextReader)+197    
HtmlAgilityPack.HtmlDocument.LoadHtml(System.String)+49    
Habitus.Toolkit.Api.HtmlSanitizerConverter.Sanitize(System.Object)+b8 

How you can see, the last call of HtmlAgilityPack is SetIdForNode() (after un LoadHtml()).

We sanitize all strings coming from the user (not-html strings also).

var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(userContent);
string output = string.Empty;
foreach (var node in HtmlDocument.DocumentNode.ChildNodes)
{
    output += node.InnerText;
}
return output;

What's wrong and why HtmlAgilityPack is consuming all my cpu time ?

Alexandre TRINDADE
  • 917
  • 10
  • 21
  • I don't know if it will affect it, but using a StringBuilder instead of a string for `output` will make it more efficient in terms of memory use and speed if there are more than perhaps 5 strings being concatenated: [String vs. StringBuilder](https://stackoverflow.com/questions/73883/string-vs-stringbuilder). – Andrew Morton Oct 06 '21 at 12:31
  • Thank's @AndrewMorton, you're right, I will change it. But I don't think that is the reason of my problem. – Alexandre TRINDADE Oct 06 '21 at 12:33

0 Answers0