I'm using HtmlAgilityPack to parse roughly 200,000 HTML documents.
I cannot predict the contents of these documents, however one such document causes my application to fail with a StackOverflowException
. The document contains this HTML:
<ol>
<li><li><li><li><li><li>...
</ol>
There are roughly 10,000 <li>
elements nested like that. Due to the way HtmlAgilityPack parses HTML it causes a StackOverflowException
.
Unfortunately a StackOverflowException is not catchable in .NET 2.0 and later.
I did wonder about setting a larger size for the thread's stack, but setting a larger stack size is a hack: it would cause my program to use a lot more memory (my program starts about 50 threads for processing HTML, so all of these threads would have the increased stack size) and would need manually adjusting if it ever came across a similar situation again.
Are there any other workarounds I could employ?