HtmlAgilityPack SelectNodes, Disposing

Question

I am trying to do some screen scraping using HtmlAgilityPack using SelectNodes and getting some values from each node returned

Here is the code

private readonly HtmlDocument _document = new HtmlDocument();

public void ParseValues(string html)
{
    _document.LoadHtml(html);
    var tables = _document.DocumentNode.SelectNodes("//table");

    foreach (var table in tables)
    {
        _document.LoadHtml(table.OuterHtml);
        var value = _document.DocumentNode.SelectSingleNode("//tbody[1]/tr/td[0]");
    }
}

But I have noticed that when trying to select children with inside the foreach loop it actually searches from the document root. Something that is really annoying.

Questions:

Is there a way to select the values from each table returned from SelectNodes without having to create new document instance from the HtmlDocument?
Is there a way to dispose HtmlDocument, because I noticed that there is a memory leak every time I use _document.LoadHtml(html);

score 1 · Answer 1 · edited May 23 '17 at 10:24

1

(for a more detailed explanation, see Html Agility Pack - Problem selecting subnode)

You don't have to create another HtmlDocument object, or load another HTML into it. You just have to do:

foreach (var table in tables)
{
    var value = table.SelectSingleNode(".//tbody[1]/tr/td[0]");
}

The key is to use .//tbody instead of //tbody.

edited May 23 '17 at 10:24

Community

1
1

answered Feb 24 '13 at 03:51

Oscar Mederos

29,016
22
84
124

So what about disposing the HtmlDocument ? – Roman Ratskey Feb 24 '13 at 15:19
Also i get this error if i did not created a new instance of HtmlDocument. startIndex cannot be larger than length of string. – Roman Ratskey Feb 24 '13 at 15:46
I'm just answering to your 1st question. – Oscar Mederos Feb 25 '13 at 07:08
I don't understand... does that error appears if you use my `foreach` instead of yours? – Oscar Mederos Feb 25 '13 at 07:10
Please, share the HTML you're parsing, so that I can debug it here, and give you a more detailed answer. You can use [pastebin](http://pastebin.com/) or a similar service. – Oscar Mederos Feb 26 '13 at 03:28

HtmlAgilityPack SelectNodes, Disposing

1 Answers1