1

I'd like to read the Info of this table (It's always the same style) in C#. It's a plan for teacher substitution and I'd like to integrate this into my time table for school.

Jeff Brogan
  • 11
  • 1
  • 3
  • 1
    short answer, you might need to strip and parse the table, if its always constant, it might be a bit faster – mahlatse Dec 06 '18 at 14:17
  • 1
    Sounds like you're looking for something called a "DOM Parser". Perhaps something like HTMLAgilityPack. A Google search should get you started on that. – David Dec 06 '18 at 14:19

3 Answers3

4

You can use a third party library like HtmlAgilityPack to parse the data into data that you can use use Linq to query

From this StackOverflow post , the following becomes simpler

tmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
var headers = doc.DocumentNode.SelectNodes("//tr/th");
DataTable table = new DataTable();
foreach (HtmlNode header in headers)
    table.Columns.Add(header.InnerText); // create columns from th
// select rows with td elements 
foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]")) 
    table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());

You can create a custom class for your specific table and check the attributes of the tables td/ or headers to know where what maps where and

e.g

var myTableClass = new TableClass();
myTbaleClass.Name = row[0]; 
.....

that will make things simpler for you.

mahlatse
  • 1,322
  • 12
  • 24
0

Okay. I found out the best solution:

var web = new HtmlWeb();
        var doc = web.Load(url);
        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {   
            foreach (HtmlNode row in table.SelectNodes("tr"))
            {
                temprow = new List<string>();
                foreach (HtmlNode cell in row.SelectNodes("td"))
                {
                    temprow.Add(cell.InnerText);
                }
                rows.Add(temprow);
            }
        }
Jeff Brogan
  • 11
  • 1
  • 3
  • 1
    I still do not see the reason why my answer was unaccepted if you are using almost the same logic? you opted to use the table element instead of the table data element. – mahlatse Dec 07 '18 at 21:40
0
private DataTable GetHtmlTable (string urlStr, int i) {
  DataTable dt = new DataTable();
  var web = new HtmlWeb();
  var doc = web.Load(urlStr);
  HtmlNode table = doc.DocumentNode
    .SelectSingleNode(
      string.Format(
        "//table[{0}]", i
    ));

  // notice the dot
  var headers = table.SelectNodes(".//tr/th");

  foreach (HtmlNode header in headers)
    dt.Columns.Add(
      header.InnerText.Replace(
        "&nbsp;", ""
    ));
                                                                                 
  // notice the dot
  foreach (var row in table.SelectNodes(".//tr[td]"))
    dtTable.Rows.Add(
      row.SelectNodes("td")
        .Select(td => td.InnerText.Replace(
          "&nbsp;", ""
        )).ToArray()
    );

    return dt;
}
sanitizedUser
  • 1,723
  • 3
  • 18
  • 33