0

I am really new in c# programming. I would like some help from you guys (if possible). I have a website (it is a shopping website ) with data : products, price, description...etc. What I would like to do is: Since the website has a search capability so I would like to get the data from it by querying the search link and get only the important data (product id, name, price and description). When I perform the search I get many pages, and every time I press next I get new page with extra list of products. How can I simply make automation of these tasks?

I searched a lot over internet I found that I need to use webclient() with regular expression, and I thought that maybe a loop over the page content and over the search result pages would be necessary. what do you think guys?

Website Example.

I´ll appreciate any effort from your side.

Mahmoud Gamal
  • 78,257
  • 17
  • 139
  • 164
  • Why do you need scraping? if you have a website, you have the data. Just get it where they are stored from – L.B Jul 22 '12 at 16:09

1 Answers1

0

What you're describing is called scraping.

What you'll want is to use something like HtmlAgilityPack to get the website. Then you find the nodes you're interested in by using the DOM, and reading their inner text.

The whole process is rather complicated, but at least I've sent you off in the right direction. For the most part, search urls tend to have the same format.

In your link for instance

http://cdon.se/hemelektronik/advanced-search?manufacturer-id=&title=.&title-matchtype=1&genre-id=&page-size=15&sort-order=142&page=2

You can change 'page' to be smething else and you can go through all the pages that way.

Added: Also don't TRY to use regex to parse html. It drove one particular person mad...

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Haedrian
  • 4,240
  • 2
  • 32
  • 53
  • What do you mean by changing 'page' to something else? – Hicham Lee Jul 22 '12 at 16:51
  • Look at the url. You're passing a number of parameters. One of them is (in this case) page=2. You can pass page=3,page=4 et cetera to get the rest. You can put it into a loop and you increase the value you're passing. – Haedrian Jul 22 '12 at 18:21