2

I'm trying to get the first 1000 search result links in google with C#. So far, I've modified Shiv Kumar's Finding links on a Web page to look for links in: string webpageUrl = "http://www.google.com/search?hl=en&num=100&q=" + "concept"; however since the google page doesn't show all the 1000 results i need to find the way to get the rest of them. could that be accomplished without google api?

gilibi

AEMLoviji
  • 3,217
  • 9
  • 37
  • 61
gilibi
  • 343
  • 2
  • 9
  • 18
  • Take a look here: https://stackoverflow.com/questions/22657548/is-it-ok-to-scrape-data-from-google-results/22703153#22703153 What you are looking for is called "scraping" in IT. – John Jun 08 '17 at 02:54

1 Answers1

5

I'd recommend you use the API.

Using "screen scraping" from HTML is problematic and requires frequent maintenance work - especially on a page like Google which will almost certainly change several times a year and which often uses redirects to track link usage.


Alternatively, if you really want to use the HTML route then take a look at the query parameters - e.g. "&start=10" - this should allow you to iterate over the pages.

But there's no guarantee that the query parameters will remain constant forever.

Stuart
  • 66,722
  • 7
  • 114
  • 165
  • 1
    Definitely use the API if possible. Otherwise, write code that does individual queries with: "&start=100&num=100", "&start=200&num=100", etc. Be aware, though, that Google frowns on screen scraping and will throttle you if you do it too much. I would recommend a delay of at least 15 seconds between requests. – Jim Mischel Mar 19 '11 at 14:46