0

Can anyone tell me if there's a way (and if so how) to load a website in a form browser without loading (downloading) specified data like; images, videos and flash.

I'm trying to create a web scraper to get access to some information. The problem is that because I need to login to the desired site, I need to web scrape it using the browser (at least this is the only way I know). And because of this the loading time is immense (because it loads all the images and unneeded data in the website.

Is there a way to just continually load (get) the source code instead of the entire site (to check for changes)?

If you know of any better methods please let me know. I'm pretty new to coding in general and the information would be quite helpful.

Jose Cancel
  • 151
  • 1
  • 2
  • 15

1 Answers1

1

HTML

Downloading HTML is as easy as that:

using (var client = new WebClient ())
{
    var html = client.DownloadString("http://google.com");
}

With images etc, you only have urls to the resources.

Checking for changes

Once you have the code, you can easily check if any changes have been made since the last time you downloaded it - simply calculate a hash of the HTML code and compare it with the previous hash. You can use for example MD5, it is as easy as that.

Login

First, check if your site has any API. If it does, there is probably something like HTTP Basic Authentication (most of the time) supported. All you need to do is to execute some request. As for this, read about RestSharp.

Community
  • 1
  • 1
ebvtrnog
  • 4,167
  • 4
  • 31
  • 59