0

I'm writing a script using Ruby, Mechanize and Nokogiri to scrape the source attributes from iframe elements on a webpage, and if there's more than one, store them inside an array for later use.

So I have the following code which works, but my question is; Is there a more elegant way of achieving this? Say, something along the lines of iframe.<some_method_like_length> instead of using the i counter?

i = 0
doc.search("//span/iframe").each do |iframe|
    $ifrmsrc[i] = iframe.attribute("src")
    i += 1
end
i = 0 
#LATER USE :)
$ifrmsrc.length.times do |g|
    puts $ifrmsrc.at(g)
end
matsjoyce
  • 5,744
  • 6
  • 31
  • 38
MattBlack
  • 3
  • 2
  • 2
    This question appears to be off-topic because it is about improving working code and belongs on [codereview.se]. – the Tin Man Dec 02 '14 at 17:57

2 Answers2

1

Sure. Use the << operator to add an item to the end of an Array.

ifrmsrc = []

doc.search("//span/iframe").each do |iframe|
  ifrmsrc << iframe.attribute("src")
end

...or, to be more Rubyish, use Enumerable#map, which executes the given block for each item in an Enumerable and returns a new Array with the results.

ifrmsrc = doc.search("//span/iframe").map {|iframe| iframe["src"] }

(If ifrmsrc already exists and already has data in it that you want, use += instead of = since = will overwrite it. However, from your code I'm guessing this is the only place items will be added to the array, so there's no need to define it ahead of time.)

P.S. Don't use global variables (i.e. variables that start with $). It's just a bad practice.

Community
  • 1
  • 1
Jordan Running
  • 102,619
  • 17
  • 182
  • 182
  • "It's just a bad practice." It's code-smell; Don't use global variables unless it's understand when and why they should be used. – the Tin Man Dec 02 '14 at 17:59
  • 1
    Instead of `iframe.attribute("src")` use `iframe['src']` for conciseness. – the Tin Man Dec 02 '14 at 18:04
  • 1
    For readability, I recommend CSS over XPath, especially for cases like this where the added capabilities of XPath aren't needed. XPath tends to add a lot of visual noise, where `search('span iframe')` is clearer. – the Tin Man Dec 02 '14 at 18:07
0

I personally prefer more XPath:

ifrmsrc = doc.xpath("//span/iframe/@src").map(&:value)

And later, you don't need the index to iterate values:

ifrmsrc.each{ |src| puts src }

Or, if you do need the index for other reasons:

ifrmsrc.each.with_index{ |src,i| puts "Source ##{i} is #{src}" }

Though, if you just want the values, one on each line:

puts ifrmsrc
Phrogz
  • 296,393
  • 112
  • 651
  • 745