If you look into XPATH and are not adverse to using open source third party tools, the HTML Agility Pack (Cose Examples) is supposed to be a great tool for parsing html.
Another option, that can be a pain, is to convert the source html string into a valid xml document, and then parse it using VB's xml name space. I have done this in an application I use to parse youtube play lists. The issue with this approach is it takes a bit of manual cleaning of the html string before you can turn it into an xml document.
Lastly you could try to digest the html string using string methods only, however this is going to be error prone and will again depend very largely on the structure of the document.
No matter what, once you have your method of parsing the html, currently in Google search results there is a div with the ID 'Search'. From a purely string stand point you could search for this in your source string as such:
dim searchTerm as string = "<div id=""search"""
dim searchLoc as integer = 0
searchLoc = sourceCode.indexOf(searchTerm)
once you know where the search results section starts you can then start searching first for "<li class=""g"""
tokens and then "<h3 class=""r"""
tokens inside those. Inside the h3
is where the result text is. You would want to consume to the first </h3>
and </li>
respectively to get the tokens.
once you had this text, you would need to sanitize it by searching through it and removing the html tags. You could easily write an algorithm to consume just the link text by looping through the indexes of key characters.
The whole point is to break it down into smaller pieces incrementally and then digest the smaller pieces. No matter how you approach it you are going to be doing this. However using a parser of some kind and utilizing the power of XPATH selector expressions would make it much easier than manually generating the tokens.
The pure string way is going to be the most difficult and also the slowest way to try and accomplish this. I would highly recommend trying to find a way to do it with some form of HTML parser otherwise you may go mad before you get a working solution.
As a final note, it looks like you are using a webbrowser control on your form. You can use this control and its related classes to parse the html of the pages it retrieves. I have done this before and it is not the most efficient way of scraping the web, but it can be very easy. Look into the HTMLDocument class for methods involving this controls return objects.