8

I'm creating a small app for myself where I run a Ruby script and save all of the images off of my blog.

I can't figure out how to save the image files after I've identified them. Any help would be much appreciated.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = '[my blog url]'
doc = Nokogiri::HTML(open(url))

doc.css("img").each do |item|
  #something
end
Phrogz
  • 296,393
  • 112
  • 651
  • 745
Zack Shapiro
  • 6,648
  • 17
  • 83
  • 151

4 Answers4

27
URL = '[my blog url]'

require 'nokogiri' # gem install nokogiri
require 'open-uri' # already part of your ruby install

Nokogiri::HTML(open(URL)).xpath("//img/@src").each do |src|
  uri = URI.join( URL, src ).to_s # make absolute uri
  File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) }
end

Using the code to convert to absolute paths from here: How can I get the absolute URL when extracting links using Nokogiri?

Community
  • 1
  • 1
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • I get an error when I use this. "output conversion failed due to conv error, bytes 0xFF 0xC3 0x98 0xC3" – Farhad Jul 07 '17 at 07:38
1

Tip: there's a simple way to get images from a page's head/body using the Scrapifier gem. The cool thing is that you can also define which type of image you want it to be returned (jpg, png, gif).

Give it a try: https://github.com/tiagopog/scrapifier

Hope you enjoy.

Tiago G.
  • 119
  • 2
  • 3
1

assuming the src attribute is an absolute url, maybe something like:

if item['src'] =~ /([^\/]+)$/
    File.open($1, 'wb') {|f| f.write(open(item['src']).read)}
end
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • @ZackShapiro That's a regular expression that matches "one or more characters that are not a forward slash, as long as they touch the end of the string"; in this case @pguardiario is using it to get the filename so that `$1` can be used to save a file with that name. It's a geeky form of the `File.basename(uri)` part of my answer. – Phrogz Nov 09 '11 at 22:58
-1
system %x{ wget #{item['src']} }

Edit: This is assuming you're on a unix system with wget :) Edit 2: Updated code for grabbing the img src from nokogiri.

Steven Jackson
  • 424
  • 3
  • 8