1

Well I am using the WebClient.DownloadString in order to scrap a webpage unfortunately the DownloadString gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ? (with the css and js code injections)

Erric J Manderin
  • 1,096
  • 3
  • 10
  • 23

2 Answers2

1

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?

You can't do that. The WebClient class is used to download a SINGLE resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as HTML Agility Pack for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.

But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • First thank you very much for your comment second is there any other more simple way to do that ? – Erric J Manderin Jul 07 '13 at 15:11
  • Yes, you will need an HTML parsing engine such as HTML Agility Pack in order to interpret the downloaded HTML from the first request and for each ` – Darin Dimitrov Jul 07 '13 at 15:12
0

The best solution for u is the ( https://htmlagilitypack.codeplex.com/ ) , it will download for u all the content of the webapage , but i'm not sure if u can get the css+javascript code using this tool