0

I'm trying to switch to vb.net a VBA code i've written to parse web pages.

Currently I use:

IE = CreateObject("internetexplorer.application")

And navigate within web pages.

Now I'm trying to make better (and faster) the code and I'm thinking about using "System.Threading" to open more web pages at same time (it takes about 5-10 seconds to open each page because the site is slow)

I read a lot of guides and threads but the more I read the more I get confused.

It's no easy for me to write code (I'm a self-taught) and I wouldn't waste time and and effort in the wrong direction.

Currently I open one web page at time and then extract some text getting it by tag.

I have to open two types of web pages: 1) First type: I have the exact url; 2) Second type: I need to fill a form to get the text I need.

Is there a good way for both types? If not what's the best way for each of them?

genespos
  • 3,211
  • 6
  • 38
  • 70

1 Answers1

0

You should avoid to create an instance of the Internet Explorer or any browser-related control because it will take too much RAM in comparison to other solutions, especially if you go for parallel web-requests.

Consider the following approach:

  • Perform the http-requests via HttpWebRequest (you should wrap this inside a class)
  • Parse the content via HtmlAgilityPack (put this also in a separate class)
  • Create a class that builds the real URL based on infos you collected in the step before.
  • Reuse your wrapper-class around HttpWebRequest to get the website you look for.

You should check out the "Task"-Class which comes with the .NET-Framework and you should have a look at the "async"-Keyword to get an first overview of the options you have regarding parallelisation.

Only use Threads directly if you really want to handle all Threading-Stuff yourself, which can be complicated if you doing it for the first time.

Stefan Wanitzek
  • 2,059
  • 1
  • 15
  • 29
  • Thanks for your answer. It will be really complicated (for me) but I'll not surrender. Could you suggest me any guide for HttpWebRequest ? (I can't put stream into html doc) – genespos Jun 09 '15 at 09:37
  • The right path to the solution is to transform the stream into a string and then pass the string to LoadHtml(). I am sure you will find some examples how to get a string out of your stream. If you won't succeed I will try to find some minutes to prepare an example for you. – Stefan Wanitzek Jun 09 '15 at 09:56
  • Do I need HtmlAgilityPack to use LoadHtml() ? – genespos Jun 09 '15 at 10:03
  • Yes. HtmlAgilityPack offers the class HtmlAgilityPack.HtmlDocument() which has the method LoadHtml that takes a string with HTML-Code. See the code of the accepted answer: http://stackoverflow.com/questions/19870116/using-htmlagilitypack-for-parsing-a-web-page-information-in-c-sharp (it avoids HttpWebRequest and it's even simpler than my approach) – Stefan Wanitzek Jun 09 '15 at 10:04