0

I having been trying to write a script that scrapes a page for images the way it has been outlined in "Save all image files from a website".

I tested that method with another page and it worked fine, but when inserting my link to scrape data:image URIs, which look like:

data:image/jpg;base64,/9j/4FEJFOIEJNFOEJOIAD//gAQTGFGRGREGg2LjEwMAD/2wBDAAgEBAQEREGREWGRWEGUFBQYGBgYGBgYGB...

I get an error beginning with initialize': File name too long and ending in (Errno::ENAMETOOLONG).

Has anyone found a way to deal with situations like this?

Community
  • 1
  • 1
jmarcs
  • 115
  • 1
  • 7
  • Why do you think this is a Nokogiri-related question? The question you reference uses Nokogiri, but yours doesn't need it to solve the problem you're asking about. – the Tin Man Apr 24 '14 at 15:25
  • I used the question I mentioned as the basis for my script which also uses Nokogiri. I just needed assistance with a particular use case. – jmarcs Apr 24 '14 at 17:33
  • Then, in other words, Nokogiri is not relevant to the question. Just because your code uses it doesn't matter if Nokogiri is not mentioned, or used, in the sample code used in your question. The tags help others locate a question; Please use them accurately. – the Tin Man Apr 24 '14 at 18:48

1 Answers1

0

data:image URLs actually contain the image inline as base 64. All you need to do is grab that data and decode it:

require 'base64'

File.open(File.basename(uri),'wb'){ |f| f.write(Base64.decode64(url[/base64,(.*)/, 1])) }
Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
  • Trying that gave me an error: " `decode64': undefined method `unpack' for nil:NilClass (NoMethodError)` " Using { |f| f.write(Base64.decode64(uri) } did create a new file for each image on the page, but they weren 0 byte error files – jmarcs Apr 22 '14 at 20:05
  • You tried on a `data:image` url or just on any url? This will work _only_ on the former... – Uri Agassi Apr 22 '14 at 20:07
  • Yes, each image on the page has an image tag that looks like "src="data:image/jpg;base64..." – jmarcs Apr 22 '14 at 20:11
  • did you take only the data part (`url[/base64,(.*)/, 1]`)? It seems that `uri` is `nil` by the exception... – Uri Agassi Apr 22 '14 at 20:12
  • I took everything between the curly brackets, except I put "uri" instead of "url", I needed to create the files in a different directory. It occurs to me that the first image tag is not a data:image, how can I skip this element? – jmarcs Apr 22 '14 at 20:15
  • wrap the line with `if url[/base64,(.*)/, 1]` – Uri Agassi Apr 22 '14 at 20:19