1

I need to write a C# code for grabbing contents of a web page. Steps looks like following

  1. Browse to login page
  2. I have user name and a password, provide it programatically and login
  3. Then you are in detail page
  4. You have to get some information there, like (prodcut Id, Des, etc.)
  5. Then need to click(by code) on Detail View
  6. Then you can get the price for that product from there.
  7. Now it is done, so we can write detail line into text file like this... ABC Printer::225519::285.00

Please help me on this, (Even VB.Net Code is ok, I can convert it to C#)

slugster
  • 49,403
  • 14
  • 95
  • 145
Buddhi Dananjaya
  • 643
  • 2
  • 12
  • 32

4 Answers4

1

The WatiN library is probably what you want, then. Basically, it controls a web browser (native support for IE and Firefox, I believe, though they may have added more since I last used it) and provides an easy syntax for programmatically interacting with page elements within that browser. All you'll need are the names and/or IDs of those elements, or some unique way to identify them on the page.

David
  • 208,112
  • 36
  • 198
  • 279
  • That's a pretty funky library, nice find - but I assume it would require a browser to be opened. Using classes inbuilt to C# would provide a much more transparent method of retrieving and processing the data. – Seidr Dec 10 '10 at 11:21
  • Hi, yea I downloaded that library. Nice one. Thanks for sharing that one with me. But I have a small issue, the site I want to get data is having a "captcha" in login screen. Can we handle that in this library. It is ok to show the "captcha" and let the user to enter it on UI. If you have a code sample that will be better. – Buddhi Dananjaya Dec 13 '10 at 09:28
1

You should be able to achieve this using the WebRequest class to retrieve pages, and the HTML Agility Pack to extract elements from HTML source.

Tim Robinson
  • 53,480
  • 10
  • 121
  • 138
1

yea I downloaded that library. Nice one.

Thanks for sharing it with me. But I have a issue with that library. The site I want to get data is having a "captcha" on the login page.

I can enter that value if this can show image and wait for my input.

Can we achive that from this library, if you can like to have a sample.

Buddhi Dananjaya
  • 643
  • 2
  • 12
  • 32
0

You should be able to achieve this by using two classes in C#, HttpWebRequest (to request the web pages) and perhaps XmlTextReader (to parse the HTML/XML response).

If you do not wish to use XmlTextReader, then I'd advise looking into Regular Expressions, as they are fantastically useful for extracting information from large bodies of text where-in patterns exist.

How to: Send Data Using the WebRequest Class

Seidr
  • 4,946
  • 3
  • 27
  • 39
  • (a) HTML is generally not XML; (b) [you can't parse HTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Tim Robinson Dec 10 '10 at 11:22
  • But how do I achive.. Clicking feature.. For login I have to click on a button. And getting price for a product I need to click on a link and wait to grab the data.. I think in this approach we cannot do such things, arn't we ...?? – Buddhi Dananjaya Dec 10 '10 at 11:23
  • While you may not be able to parse HTML with RegEx, you are able to extract pieces of information from specific sections of a known HTML structure with it. Regarding the 'clicking', this is achieved by creating your own POST/GET requests. When you click a submit button, one of these two types of requests is sent to the form target. You would simply have to find out what data is being sent and recreate that request using the WebRequest class. – Seidr Dec 10 '10 at 11:25