-1

I want to know how I can parse a website using VB.net. Basically There is a table in a page that I need to update the database with what is inside it. So I need to be able to capture whatever is within and then get all the columns and rows and put them in a datatable object. Any help is appreciated

Rob Schneider
  • 679
  • 4
  • 13
  • 27
  • good question! nothing at this stage, just reading up on few things. I know with something like IHTMLInputElement I can parse an HTML file. But if I want to parse a webpage without needing to download it, is what I'm looking for. Also not sure if there are any better or easy way, thats why I'm asking here. cheers – Rob Schneider Jul 06 '12 at 14:17
  • [Regular Expressions?](http://instantrimshot.com/index.php?sound=rimshot&play=true) – Greg B Aug 08 '13 at 12:56

2 Answers2

2

Start by using the HTML Agility Pack. Give the parsing a shot and come back with specific questions if you run into issues.

You don't have to save it to a file, you can get it in memory:

HtmlAgilityPack.HtmlWeb w = new HtmlAgilityPack.HtmlWeb();
var doc = w.Load("http://www.microsoft.com/en-us/default.aspx");
textBox1.Text = doc.DocumentNode.OuterHtml;
John Koerner
  • 37,428
  • 8
  • 84
  • 134
  • Thank you sir, Looks interesting. – Rob Schneider Jul 06 '12 at 14:20
  • Okay, Just gave this a try and got a question. The page that I want to parse is a asp page. I created a variable of htmlweb class and then passed the link of the page as well as the path to it, but it didn't work. What needs to be done in order to get the html content of an asp page ? – Rob Schneider Jul 06 '12 at 14:40
  • 1
    "It didn't work" doesn't provide any useful information. If you are getting an error, post a new question and show what code you have tried and clearly explain what you are trying to do. – John Koerner Jul 06 '12 at 14:47
  • Well, I only wrote two lines of code. first declared a new object of type HtmlWeb and then called the method Get(url,path) with the url of the asp page. And it didn't download the file to the path. So something is not working. Could you let me know how I can get the HTML element of an asp page, and if it is possible to do so with HTMLAGILITYPACK ? Cheers – Rob Schneider Jul 06 '12 at 15:03
  • I added an example. Go through the documentation that is available for download, it will help. – John Koerner Jul 06 '12 at 15:14
  • Exactly what I wanted, Cheers – Rob Schneider Jul 06 '12 at 15:26
0

Each of these windows can contain an HTML document.

A file that specifies how the screen is divided into frames is called a frameset.

If you want to make a homepage that uses frames you should: make an HTML document with the frameset

make the normal HTML documents that should be loaded into each of these frames.

When a frameset page is loaded, the browser automatically loads each of the pages associated with the frames.

bunty
  • 1