2

I know how to find an element using Nokogiri. I know how to click a link using Mechanize. But I can't figure out how to find a specific link and click it. This seems like it should be really easy, but for some reason I can't find a solution.

Let's say I'm just trying to click on the first result on a Google search. I can't just click the first link with Mechanize, because the Google page has a bunch of other links, like Settings. The search result links themselves don't seem to have class names, but they're enveloped in <h3 class="r"></h3>.

I could just use Nokogiri to follow the href value of the link like so:

document = open("https://www.google.com/search?q=stackoverflow")
parsed_content = Nokogiri::HTML(document.read)
href = parsed_content.css('.r').children.first['href']
new_document = open(href)
# href is equal to "/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;url=https%3A%2F%2Fstackoverflow.com%2F"

but it's not a direct url, and going to that url gives an error. The data-href value is a direct url, but I can't figure out how to get that value - doing the same thing except with ...first['data-href'] returns nil.

Anyone know how I can just find the first .r element on the page and click the link inside it?

Here's the start to my action:

require 'open-uri'
require 'nokogiri'
require 'mechanize'
document = open("https://www.google.com/search?q=stackoverflow")
parsed_content = Nokogiri::HTML(document.read)

Here's the .r element on the Google search results page:

<h3 class="r">
  <a href="/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;url=https%3A%2F%2Fstackoverflow.com%2F" data-href="https://stackoverflow.com/">Stack Overflow</a>
</h3>

1 Answers1

0

You should make sure your question is the correct code in your example - it looks like it is not, because you don't surround the url in quotes and the css selector is .r a not r. You use .r a because you want to access the link inside elements with the r class.

Anyway, you can use the approach detailed here like so:

require 'open-uri'
require 'nokogiri'
require 'uri'

base_url = "https://www.google.com/search?q=stackoverflow"
document = open(base_url)
parsed_content = Nokogiri::HTML(document.read)
href = parsed_content.css('.r').first.children.first['href']
new_url = URI.join base_url, href
new_document = open(new_url)

I tested this and following new_url does redirect to StackOverflow as expected.

max pleaner
  • 26,189
  • 9
  • 66
  • 118
  • Good catch, I typed instead of copy + paste. This is weird though, this exact code doesn't work for me, but using `('.r').first.children.first['href']` and just `href` instead of `href.value`, does. –  Nov 12 '17 at 23:38
  • @JosefKrazinsky you're right, must have made an error copy-pasting myself – max pleaner Nov 12 '17 at 23:45