-6

I am a novice using C# to scrape sites. I understand how to find hrefs and how to handle really simple tables.

Now I want to parse this .. and just pick out the first text i.e. 'office manager' and the href.

<tr>
  <td>Office Manager</td>
  <td>Office & Admin</td>
  <td>Cambridge</td>
  <td class="btn-wrapper desktop-btn"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>
<tr class="mobile-btn">
  <td colspan="3" class="btn-wrapper"><a href="http://www.itoworld.com/office-manager/" class="std-btn">Find out more</a></td>
</tr>

Also can folk recommend a site where I can learn my way into the world of nodes, tds and trs?

GSerg
  • 76,472
  • 17
  • 159
  • 346
Peter
  • 1
  • 2

2 Answers2

0

You may use CsQuery library (available in nuget) to parse HTML using jQuery syntax:

var page = new CQ(html);
var firstManagerHref = page.Find("a.std-btn:first()").Attr("href");
opewix
  • 4,993
  • 1
  • 20
  • 42
-1

If you want to retrieve information from HTML I'd recommend using a library like this one:

http://html-agility-pack.net/

FunkyPeanut
  • 1,152
  • 2
  • 9
  • 28