How can I parse this in C#

Question

I am a novice using C# to scrape sites. I understand how to find hrefs and how to handle really simple tables.

Now I want to parse this .. and just pick out the first text i.e. 'office manager' and the href.

<tr>
  <td>Office Manager</td>
  <td>Office & Admin</td>
  <td>Cambridge</td>
  <td class="btn-wrapper desktop-btn"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>
<tr class="mobile-btn">
  <td colspan="3" class="btn-wrapper"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>

Also can folk recommend a site where I can learn my way into the world of nodes, tds and trs?

Not very clear but the answer probably is HtmlAgilityPack – H H Jul 22 '17 at 13:37 — H H, Jul 22 '17 at 13:37

score 0 · Answer 1 · answered Jul 22 '17 at 13:39

0

You may use CsQuery library (available in nuget) to parse HTML using jQuery syntax:

var page = new CQ(html);
var firstManagerHref = page.Find("a.std-btn:first()").Attr("href");

answered Jul 22 '17 at 13:39

opewix

4,993
1
20
42

score -1 · Accepted Answer · answered Jul 22 '17 at 13:38

-1

If you want to retrieve information from HTML I'd recommend using a library like this one:

http://html-agility-pack.net/

answered Jul 22 '17 at 13:38

FunkyPeanut

1,152
2
9
28

How can I parse this in C#

2 Answers2