How to get the jpg from this url?

Question

This API provides thumbnails from websites.

<img style="-webkit-user-select: none" src="http://webthumb.bluga.net/easythumb.php?user=00000&url=www.consumerreports.com&hash=sdf9g879d8f7g9sd8fg7s9df&size=medium&cache=30">

The user id and hash value have been redacted, but if they were right, this tag would result in a small thumbnail on your page called easythumb.jpeg.

Is there any way to grab that thumbnail and store it either in my DB or in AWS?

Or was webthumb carefully designed to prevent such behavior?

Edit:

Tried Nokogiri per the suggestion below, and here is the return. It looks like there's no way to get a jpg out of this. Am I right?

Possible duplicate? http://stackoverflow.com/questions/1074309/how-do-i-download-a-picture-using-ruby — Joel Brewer, Nov 01 '14 at 02:44
Ahh. Check this out: http://stackoverflow.com/questions/7926675/save-all-image-files-from-a-website -- The top answer suggests using Nokogiri, which I've also used for scraping/downloading — Joel Brewer, Nov 01 '14 at 03:05
Did you select the `img` tags? Something like this: (from the linked answer) `Nokogiri::HTML(open(URL)).xpath("//img/@src").each do |src| uri = URI.join( URL, src ).to_s # make absolute uri File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) } end` — Joel Brewer, Nov 01 '14 at 03:32
I didn't because I didn't see "img" anywhere in that mass of text that was returned to me. Also don't know what that 'xpath' does. Is that the secret sauce which will get my jpeg? Guess I'll start reading up on nokogiri! — dwilbank, Nov 01 '14 at 03:45
Well, `img` simply looks for any image tags in the HTML. Not sure if you would see that in the Nokogiri output. Xpath is simply a way to traverse the DOM -- nothing too magical. Good luck! — Joel Brewer, Nov 01 '14 at 03:47
That "mass of text" is probably your image. You need to save it. — , Nov 01 '14 at 04:04
We need to see source code showing what you're trying. As is you're asking us to image it. As is this isn't a complete question. Nokogiri parses HTML, it doesn't retrieve anything. OpenURI does that for your code, and is all that is necessary to retrieve an image, after which you can save the returned content to a file. — the Tin Man, Nov 03 '14 at 21:21
Also, don't use screen shots to show us important text. Screen shots are not easy to read, nor is the text in it easily reused if we need to do so. — the Tin Man, Nov 03 '14 at 21:39

score 2 · Accepted Answer · answered Nov 03 '14 at 21:37

It's important to understand what things do what. Here's some code, which has been tested to the point of downloading the image:

require 'nokogiri'
require 'open-uri'

html = '<img style="-webkit-user-select: none" src="http://webthumb.bluga.net/easythumb.php?user=00000&url=www.consumerreports.com&hash=sdf9g879d8f7g9sd8fg7s9df&size=medium&cache=30">'
doc = Nokogiri::HTML(html)

uri = URI.parse(doc.at('img')['src']) 
# => #<URI::HTTP:0x007f8e13258520 URL:http://webthumb.bluga.net/easythumb.php?user=00000&url=www.consumerreports.com&hash=sdf9g879d8f7g9sd8fg7s9df&size=medium&cache=30>

File.basename(uri.path) 
# => "easythumb.php"

File.open(File.basename("#{ uri.path }.jpeg"), 'wb') { |fo| fo.write(open(uri).read) }

That all said, the URL isn't valid. Opening a browser page and pasting in that URL returns "Bad Hash", not an image.

How to get the jpg from this url?

1 Answers1