Need Help in building a "robot" that extracts data from HTTP request

Question

I am building a web site in ASP.net and C# that one of its components involves log-in to a website that the user has an account (for example cellular phone company) on behalf of the user, take information from this site and store it in our database.

I think this action called "scraping".

Are there any products that already does so that I can use to integrate with my software ?

I don't need a software that does it, I need some sort of SDK that I can integrate with my C# code.

Thanks,

Koby

Try [Selenium](http://seleniumhq.org/download/). You'll need an interactive desktop to run the browser, though, so might not be easy to set up as a service. — Rup, Oct 12 '11 at 14:08
Andrey - Actually posting this question is part of the research. Why not use the experience of others if there are people that willing to share their knowledge. This is what this site is all about. Believe it or not, I am doing a research. — Koby Mizrahy, Oct 12 '11 at 14:17

score 2 · Accepted Answer · edited May 23 '17 at 10:34

2

Use the HtmlAgilityPack to parse the HTML that you get from a web request once you've logged in.

See here for logging in: Login to website, via C#

edited May 23 '17 at 10:34

Community

1
1

answered Oct 12 '11 at 14:10

George Duckett

31,770
9
95
162

Thanks George. The login to website part is really the missing part I have. – Koby Mizrahy Oct 12 '11 at 14:18

score 1 · Answer 2 · answered Oct 12 '11 at 14:09

I haven't found any product, that would do it right so far.
One way to handle this is to
- do requests by your self
- use http://htmlagilitypack.codeplex.com/ to extract important information from downloaded html
- save extracted information by your self

Thing is, that depending on context, there are so many things to tune/configure, that you need very large product and still it won't reach custom solution performance/accuracy:
a) multithreading control
b) extraction rules
c) persistance control
d) web spidering (or how next link to parse is chosen)

score 0 · Answer 3 · answered Oct 12 '11 at 14:16

0

Check the Web Scraping Wikipedia Entry.

However I would say since what we need to acquire via web-scraping is application specific, most of the time, it may be more efficient to scrape whatever you need from a web response stream.

answered Oct 12 '11 at 14:16

apokryfos

38,771
9
70
114

Need Help in building a "robot" that extracts data from HTTP request

3 Answers3