0

If it is possible to read information from a website that is organized using a standardized layout, meaning that the site's controls/textboxes/button etc are always in the same location, but the data/values it is holding changes.

In a C# WinForm application can I open that page in the background read some values and use those in my form? Is there a way to reference specific areas/items on the webpage, even if it is something as inefficient as tabbing exactly 12 times? Am I totally dreaming here?

Again I don't need to click certain things on the page, just read what is in a certain textbox or things of that nature.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
ikathegreat
  • 2,311
  • 9
  • 49
  • 80

3 Answers3

6

Html Agility Pack is a popular choice for doing this kind of thing.

carla
  • 1,970
  • 1
  • 31
  • 44
David Peden
  • 17,596
  • 6
  • 52
  • 72
4

You could also use the webbrowser control to do this. To get all of the posters in this thread and their reputation, you could do this:

private void Form1_Load(object sender, EventArgs e)
{
    webBrowser1.Navigated += new WebBrowserNavigatedEventHandler(webBrowser1_Navigated);
    webBrowser1.Navigate("http://stackoverflow.com/questions/9712699/read-website-information-display-application");    
}


private void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{

    foreach (HtmlElement ele in webBrowser1.Document.GetElementsByTagName("SPAN"))
    {
        if (ele.GetAttribute("title") == "reputation score")
        {  
            MessageBox.Show(ele.Parent.Children[0].InnerText + " - "+ ele.InnerHtml);
        }
    }
}
John Koerner
  • 37,428
  • 8
  • 84
  • 134
1

Sure, you can do this. The exact implementation might change based on the web page, it's layout, etc.

As a basic/simple outline: Use a WebClient to retrieve the web page as a string and then use a Regex to read the matching HTML part. Things like "hit tab x times" won't work and are rather hard or complicated to implement as you'd have to either embed a browser control or parse the HTML yourself.

Mario
  • 35,726
  • 5
  • 62
  • 78
  • +1. Note that RegEx is useful for picking values from the page when there is obvious chunks to locate them like "id='aaa'>text to scrap<" . If you need to find "second span inside third nested div" - time to use HtmlAgilityPack as suggested by DPeden. – Alexei Levenkov Mar 15 '12 at 01:22