0

I am building a web site in ASP.net and C# that one of its components involves log-in to a website that the user has an account (for example cellular phone company) on behalf of the user, take information from this site and store it in our database.

I think this action called "scraping".

Are there any products that already does so that I can use to integrate with my software ?

I don't need a software that does it, I need some sort of SDK that I can integrate with my C# code.

Thanks,

Koby

Koby Mizrahy
  • 1,361
  • 2
  • 12
  • 23
  • 8
    Zero research detected. – Andrey Oct 12 '11 at 14:06
  • Try [Selenium](http://seleniumhq.org/download/). You'll need an interactive desktop to run the browser, though, so might not be easy to set up as a service. – Rup Oct 12 '11 at 14:08
  • Andrey - Actually posting this question is part of the research. Why not use the experience of others if there are people that willing to share their knowledge. This is what this site is all about. Believe it or not, I am doing a research. – Koby Mizrahy Oct 12 '11 at 14:17

3 Answers3

2

Use the HtmlAgilityPack to parse the HTML that you get from a web request once you've logged in.

See here for logging in: Login to website, via C#

Community
  • 1
  • 1
George Duckett
  • 31,770
  • 9
  • 95
  • 162
1

I haven't found any product, that would do it right so far.
One way to handle this is to
- do requests by your self
- use http://htmlagilitypack.codeplex.com/ to extract important information from downloaded html
- save extracted information by your self

Thing is, that depending on context, there are so many things to tune/configure, that you need very large product and still it won't reach custom solution performance/accuracy:
a) multithreading control
b) extraction rules
c) persistance control
d) web spidering (or how next link to parse is chosen)

Giedrius
  • 8,430
  • 6
  • 50
  • 91
0

Check the Web Scraping Wikipedia Entry.

However I would say since what we need to acquire via web-scraping is application specific, most of the time, it may be more efficient to scrape whatever you need from a web response stream.

apokryfos
  • 38,771
  • 9
  • 70
  • 114