1

Is there some good tutorial or sample to learn about http web scraping? How to start developing a tool that can search on some web sites and download specific information so I can collect it automatically and then analyse?? thanks!

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
willyMon
  • 612
  • 8
  • 19

1 Answers1

2

A tool commonly recommended for this is the Html Agility Pack. It will take malformed HTML and massage it into XHTML and then a traversable DOM, so is very useful for the code you find in the wild, as opposed to approaches like RegEx, which are destined to break.

There are some examples and the API documentation here:

http://html-agility-pack.net/api

Some useful links:

wp78de
  • 18,207
  • 7
  • 43
  • 71
D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283