0

I am trying to fetch results from google and saving them to a file. But the results are getting repeated. Also when I save them to file, only the last one link is getting printed to file.

require 'rubygems'
require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.google.com/videohp')

google_form = page.form('f')
google_form.q = 'ruby'

page = agent.submit(google_form, google_form.buttons.first)
linky = page.links
for link in linky do
  if link.href.to_s =~/url.q/
    str=link.href.to_s
    strList=str.split(%r{=|&})
    $url=strList[1].gsub("h%3Fv%3D", "h?v=")
    $heading = link.text
    $res = $url
    if ($url.to_s.include? "webcache")
      next
    elsif ($url.to_s.include? "channel")
      next
    end
    puts $res
  end
end

for link in linky do
  File.open("aaa.htm", 'w') { |file| file.write($res) }
end
  • 1
    `file.write($res)` always writes the value of `$res` to the file. You probably want to do something with `link` instead (or move the writing into the first loop). Besides, you should use `each` instead of `for` and avoid global variables (those starting with `$`). – Stefan Sep 21 '17 at 11:20
  • Thanks @Stefan I will correct them. –  Sep 21 '17 at 12:05

3 Answers3

1
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.google.com/videohp')

google_form = page.form('f')
google_form.q = 'ruby'

page = agent.submit(google_form, google_form.buttons.first)
linky = page.links
for link in linky do
  if link.href.to_s =~/url.q/
    str=link.href.to_s
    strList=str.split(%r{=|&})
    $url=strList[1].gsub("h%3Fv%3D", "h?v=")
    $heading = link.text
    $res = $url
    if ($url.to_s.include? "webcache")
      next
    elsif ($url.to_s.include? "channel")
      next
    end
    puts $res
    File.open("aaa.htm", 'w') { |file| file.write($res) }
  end
end
Mukarram Ali
  • 387
  • 5
  • 24
0

It looks like you don't really know Ruby.

Please do not use global variables unless you really need them - in this case you don't, it's not PHP. Simple assignment is enough. :)

To iterate through collection, use dedicated #each method. In your case you'd like to filter collection of links and leave those that match your needs valid_links = links.filter { |link| ... }.

Return false if they don't match your needs, return true if they match your statements.

In the File.open, you need to go through the collection inside File.open block (you will have valid_links to go through).

konole
  • 766
  • 4
  • 8
0

This is really two questions and it's clear you're just starting out with Ruby- you will get better with practice but it would help to keep reading up on the fundamentals of the language, this looks a bit like PHP written in Ruby.

First up the links are quite probably showing up multiple times because they are present more than once in the page. You aren't doing anything to catch that.

Secondly you have a global variable ( these tend to cause problems and should only really be used if you can't find an alternative ) which you are putting each URL into, but every time you do that, you overwrite what you had before. So every time you go $res = $url you are overwriting whatever was in $res with the last $url you got.

If you made an array instead of having the single value $res ( it can be a local variable too ) then you could just use myArray.push(url) to add each new url to it.

When you have got all the urls in your array, you could use myArray.uniq to get rid of the duplicates before you write it out to your file.

glenatron
  • 11,018
  • 13
  • 64
  • 112