0

This is a newbie question so please provide working code.

How do I count the tables in an html file using C# and the html-agility-pack?

(I will need to get values from specific tables in an html file based on the count of tables. I will then perform some math on the values retrieved.)

Here is a sample file with three tables for your convenience:

<html>
<head>
<title>Tables</title>
</head>
<body>
<table border="1">
  <tr>
    <th>Name</th>
    <th>Phone</th>
    <th>City</th>
    <th>Number</th>
  </tr>
  <tr>
    <td>Scott</td>
    <td>555-2345</td>
    <td>Chicago</td>
    <td>42</td>
  </tr>
  <tr>
    <td>Bill</td>
    <td>555-1243</td>
    <td>Detroit</td>
    <td>23</td>
  </tr>
  <tr>
    <td>Ted</td>
    <td>555-3567</td>
    <td>Columbus</td>
    <td>9</td>
  </tr>
</table>
<p></p>
<table border="1">
  <tr>
    <th>Name</th>
    <th>Year</th>
  </tr>
  <tr>
    <td>Abraham</td>
    <td>1865</td>
  </tr>
  <tr>
    <td>Martin</td>
    <td>1968</td>
  </tr>
  <tr>
    <td>John</td>
    <td>1963</td>
  </tr>
</table>
<p></p>
<table border="1">
  <tr>
    <th>Animal</th>
    <th>Location</th>
    <th>Number</th>
  </tr>
  <tr>
    <td>Tiger</td>
    <td>Jungle</td>
    <td>8</td>
  </tr>
  <tr>
    <td>Hippo</td>
    <td>River</td>
    <td>4</td>
  </tr>
  <tr>
    <td>Camel</td>
    <td>Desert</td>
    <td>3</td>
  </tr>
</table>
</body>
</html>

If you would, please SHOW how to send the results to a new text file.

Thanks!

user1944272
  • 209
  • 6
  • 15

2 Answers2

2

I think this can be a starting point

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var tables = doc.DocumentNode.Descendants("table");
int tablesCount = tables.Count();

foreach (var table in tables)
{
    var rows = table.Descendants("tr")
                    .Select(tr => tr.Descendants("td").Select(td => td.InnerText).ToList())
                    .ToList();

    foreach(var row in rows)
        Console.WriteLine(String.Join(",", row));
}
I4V
  • 34,891
  • 6
  • 67
  • 79
  • I4V, I was not able to get your code to work. I would like to understand it. I replaced the "html" in brackets with the address to my html file and ran the code. All that occurred was that a black screen quickly flased once. Nothing else. I added "Console.ReadLine();" after the last "}" in the code you supplied and ran it. The black screen now stays open and the cursor flashes at the beginning of it. No values are returned. I do not see how "int tablesCount = tables.Count();" could ever be output and I am not sure what the remainder of the code is supposed to do. Please clarify. – user1944272 Apr 28 '13 at 19:38
  • `I replaced the "html" in brackets with the address to my html file and ran the code`. No, `LoadHtml` expects an html **string**, if you want to load from **file** you should use `Load` – I4V Apr 28 '13 at 19:54
1

Something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load(myTestFile);

// get all TABLE elements recursively
int count = doc.DocumentNode.SelectNodes("//table").Count;

// output to a text file
File.WriteAllText("output.txt", count.ToString());
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • Simon, I was able to get your code to work for me. Being not that familiar with C# it took a little work but I learned something. Thanks! – user1944272 Apr 28 '13 at 19:17