Scraping/Parsing Google search results in Ruby

Question

Assume I have the entire HTML of a Google search results page. Does anyone know of any existing code (Ruby?) to scrape/parse the first page of Google search results? Ideally it would handle the Shopping Results and Video Results sections that can spring up anywhere.

If not, what's the best Ruby-based tool for screenscraping in general?

To clarify: I'm aware that it's difficult/impossible to get Google search results programmatically/API-wise AND simply CURLing results pages has a lot of issues. There's concensus on both of these points here on stackoverflow. My question is different.

I suggest to take a look at the Google rank checker ( http://google-rank-checker.squabbel.com ). It's not ruby, it's written in PHP. But it's open source and solves all tasks you need. You didn't seem to really be fixed at ruby, I've personally used PHP (console scripts) for many such projects (also in production environments). Anyway, even when you write in ruby you'll find the PHP code useful as some tasks when scraping Google can be quite tricky (delays, IPs, DOM parsing, send correct GET parameters, etc ). — John, Feb 29 '12 at 00:15
This is an OLD question so anyone using it to justify using scraping instead of Google's API needs to rethink their logic. Use the API, that's what it's there for. — the Tin Man, Mar 06 '20 at 04:36
Use "[Google Custom Search](https://developers.google.com/custom-search/docs/tutorial/introduction)" instead. — the Tin Man, Mar 21 '20 at 21:10

score 9 · Answer 1 · edited Mar 06 '20 at 04:46

9

This should be very simple thing, have a look at the "Screen Scraping with ScrAPI" screen cast by Ryan Bates. You still can do without scraping libraries, just stick to things like Nokogiri.

From Nokogiri's documentation:

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we’re interested in...

doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
  puts link.content
end

####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
  puts link.content
end

####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
  puts link.content
end

edited Mar 06 '20 at 04:46

the Tin Man

158,662
42
215
303

answered Oct 08 '09 at 19:06

khelll

23,590
15
91
109

And you can do `link['href']` to get the href of the link ;). – Dorian Jun 02 '12 at 23:29
Ryan has two screencasts on scraping the one on ScrAPI mentioned above and [one on Nokogiri](http://railscasts.com/episodes/190-screen-scraping-with-nokogiri) which uses code more similar to the one in this answer. – notapatch Jul 06 '13 at 17:48
It seems that google changed layout of page and this code is not working anymore. – reducing activity Aug 23 '18 at 16:30

score 3 · Answer 2 · edited Mar 06 '20 at 04:39

3

I'm unclear as to why you want to be screen scraping in the first place. Perhaps the REST search API would be more appropriate? It will return the results in JSON format, which will be much easier to parse, and save on bandwidth.

For example, if your search was 'foo bar', you could just send a GET request to http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=foo+bar and handle the response.

For more information, see "Google Search REST API" or Google's developer page.

edited Mar 06 '20 at 04:39

the Tin Man

158,662
42
215
303

answered Oct 08 '09 at 19:36

pkaeding

36,513
30
103
141

1

It does not return the same results sadly. See: http://code.google.com/p/google-ajax-apis/issues/detail?id=43 – Anders Rune Jensen May 23 '10 at 15:06
"The Google Web Search API is no longer available" – reducing activity Aug 23 '18 at 16:03
Use "[Google Custom Search](https://developers.google.com/custom-search/docs/tutorial/introduction)" instead. – the Tin Man Mar 21 '20 at 21:11

score 0 · Answer 3 · edited Mar 06 '20 at 04:40

0

I would suggest HTTParty + Google's Ajax search API.

edited Mar 06 '20 at 04:40

the Tin Man

158,662
42
215
303

answered May 08 '10 at 10:17

knoopx

17,089
7
36
41

As written this is hardly an answer. Point to the appropriate pages, show why it's a usable answer with some code examples. – the Tin Man Mar 06 '20 at 04:41

score -1 · Answer 4 · edited Mar 06 '20 at 04:41

-1

You should be able to accomplish your goal easily with Mechanize.

If you already have the results, all you need is Hpricot or Nokogiri.

edited Mar 06 '20 at 04:41

the Tin Man

158,662
42
215
303

answered Oct 08 '09 at 19:06

Avdi

18,340
6
53
62

You're welcome! And see my update: if you already have the results, Mechanize may be overkill. – Avdi Oct 08 '09 at 19:08
Hpricot is no longer supported so don't go there. Nokogiri is alive and well, and does support Hpricot's syntax, but don't use it, use the normal Nokogiri syntax as demonstrated in the cheet-sheet and tutorials. – the Tin Man Mar 06 '20 at 04:43
Unfortunately, because Google uses DHTML for more and more of the page, scraping is more difficult than it used to be. Instead use "[Google Custom Search](https://developers.google.com/custom-search/docs/tutorial/introduction)". – the Tin Man Mar 21 '20 at 21:12

score -1 · Answer 5 · answered Dec 22 '17 at 19:38

-1

Scrapping has became harder and harder as Google keep changing while expanding how the results are structured (Rich snippets, knowledge graph, direct answer, etc.), we built a service that handle part of this complexity and we do have a Ruby library. It's pretty straightforward to use:

query = GoogleSearchResults.new q: "coffee"

# Parsed Google results into a Ruby hash
hash_results = query.get_hash

answered Dec 22 '17 at 19:38

Hartator

5,029
4
43
73

1

This seems to require that you pay Google for a SERP API key. – Apr 08 '19 at 16:33
Use "[Google Custom Search](https://developers.google.com/custom-search/docs/tutorial/introduction)" instead. – the Tin Man Mar 21 '20 at 21:13

score -1 · Answer 6 · answered Sep 16 '11 at 17:54

-1

I don't know Ruby specific code but this google scraper could help you. That's an online tool demo that works scraping and parsing Google results. The most interesting thing is the article there with the explanation of the parsing process in PHP but it's applicable to Ruby and any other programming language.

answered Sep 16 '11 at 17:54

Lix

162
1
9

At this moment it just displays an endless list of CAPTCHAs to solve. – reducing activity Aug 23 '18 at 16:26

Scraping/Parsing Google search results in Ruby

6 Answers6

Linked