This is nested about 10 functions deep, so I'll just paste the relevant bits:
This line is really slow:
var nodes = Filter_Chunk(Traverse(), chunks.First());
Specifically, this chunk inside Filter_Chunk
(pun not intended):
private static IEnumerable<HtmlNode> Filter_Chunk(IEnumerable<HtmlNode> nodes, string selectorChunk)
{
// ...
string tagName = selectorChunk;
foreach (var node in nodes)
if (node.Name == tagName)
yield return node;
There's nothing too complicated in there... so I'm thinking it must be the sheer number of nodes in Traverse()
right?
public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
yield return node;
foreach (var child in Children().Traverse())
yield return child;
}
}
public SharpQuery Children()
{
return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}
I tried finding <h3>
nodes on stackoverflow.com. There shouldn't be more than a couple thousand nodes, should there? Why is this taking many minutes to complete?
Actually, there's definitely a bug in here somewhere that is causing it to return more nodes than it should... I forked the question to address the issue