1

I thought this would be fairly simple but it's proving challenging. Google uses https:// now and bing redirects to remove HTTP://.

How can I grab the top 5 URLs for a given search term?

I've tried several methods (including loading results into an iframe), but keep hitting brick walls with everything I try.

I wouldn't even need a proxy, as I'm talking about a very small amount results to be harvested, and will only use it for 20-30 terms once ever few months. Hardly enough to trigger whiplash from the search giants.

Any help would be much appreciated!

Here's one example of what I've tried:

$query = urlencode("test"); 

preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.bing.com/search?q=" . urlencode($query) ), $matches); 

echo implode("<br>", $matches[1]);
Machavity
  • 30,841
  • 27
  • 92
  • 100
Casey Dwayne
  • 2,142
  • 1
  • 17
  • 32
  • [Wouldn't you prefer an HTML Parser instead?](http://stackoverflow.com/a/1732454/102937) – Robert Harvey Nov 15 '13 at 22:14
  • For such a small amount of data, wouldn't a paper and pencil suit you? –  Nov 15 '13 at 22:15
  • I have http://sourceforge.net/projects/simplehtmldom/ but can't seem to use it properly. All I really need is the `` tags from Bing's SERP. – Casey Dwayne Nov 15 '13 at 22:16
  • @MikeW The point is to make it automated, so I don't have to manually retrieve the top 5 or so URLs for each of the 20-30 terms. Work hard now, work easy later. – Casey Dwayne Nov 15 '13 at 22:17
  • 1
    Take a look here http://stackoverflow.com/questions/22657548/is-it-ok-to-scrape-data-from-google-results – John Jan 13 '17 at 00:20

2 Answers2

5

There's three main ways to do this. Firstly, use the official API for the search engine you're using - Google has one, and most of them will. These are often volume limited, but for the numbers you're talking about, you'll be fine.

The second way is to use a scraper program to visit the search page, enter a search term, and submit the associated form. Since you've specified PHP, I'd recommend Goutte. Internally it uses Guzzle and Symfony Components, so it must be good! The README at the above link shows you how easy it is. Selection of HTML fragments is done using either XPath or CSS, so it is flexible too.

Lastly, given the low volume of required scrapes, consider downloading a free software package from Import.io. This lets you build a scraper using a point-and-click interface, and it learns how to scrape various areas of the page before storing the data in a local or cloud database.

halfer
  • 19,824
  • 17
  • 99
  • 186
1

You can also use a third party service like Serp Api to get Google results.

It should be pretty easy to integrate::

$query = [
    "q" => "Coffee",
    "google_domain" => "google.com",
];

$serp = new GoogleSearchResults();
$json_results = $serp.json($query);

GitHub project.

Hartator
  • 5,029
  • 4
  • 43
  • 73