3

In Imgur, you can input an image URL and a few seconds later, there's a thumbnail of the image. Or in Bing Search, you can (or used to) be able to view a thumbnail of the website in the search results before visiting it.

I would love to implement something similar for my website, but I can't wrap my head around on how it is done. Moreover, are there not security concerns? I'd imagine the servers have to at least download the website, render it and take a screenshot. What if it's a malicious website, and you download something malicious on your server?

dyip1
  • 247
  • 1
  • 3
  • 7
  • I found this: http://url2png.com/ by googling 'website screenshot api' – gerrytan Sep 18 '13 at 23:13
  • It think it has been answered before at http://stackoverflow.com/questions/757675/website-screenshots-using-php – Source Sep 18 '13 at 23:19
  • Reddit just picks an [seemingly random] image from the linked page and then resizes and crops it. Imgur has the image so it's just resized. Bing/Google providing a rendered screen of the page itself is super-complex and your best bet is likely the API that @gerrytan linked. – Sammitch Sep 18 '13 at 23:19
  • @gerrytan, thanks for the link but I would like to implement this as my own service to reduce latency from calling an external API if possible (and its an opportunity to learn more!). – dyip1 Sep 18 '13 at 23:20
  • You're basically going to have to implement a DOM rendering engine [[like Gecko](http://en.wikipedia.org/wiki/Gecko_%28layout_engine%29)] and probably a javascript engine as well to get proper renders of modern sites. To say nothing of what happens when there's embedded flash. – Sammitch Sep 18 '13 at 23:27
  • @gerrytan, ha! Checkout the "bored" link at the bottom of your recommended site `:=)` – halfer Sep 20 '13 at 19:02

2 Answers2

2

A headless Web browser engine like PhantomJS can be used for this. See example on their wiki. Yes, it would be prudent to run this in some sort of a sandbox, feeding a queue of URLs into it, then taking the generated thumbnails from the file system.

Vasiliy Faronov
  • 11,840
  • 2
  • 38
  • 49
0

While I don't know the internal workings of any of the aforementioned services, I'd guess that they download/create a local copy of the images and generate a thumbnail from that.

Imgur, as an image hosting service, definitely needs a copy of the image prior to being able to generate thumbnails or anything else from it. The image may be stored locally or just in memory, but either way, it must be downloaded.

The search engines displaying screenshots of the sites likely have services that periodically take a screenshot of the viewable area when the content is getting indexed, and then serve those screenshots (or derivatives) along with the search results. Taking a screenshot really isn't dangerous, so there's nothing to worry about there, and whatever tools are used to load/parse/index the websites will obviously be written with security considerations in mind.

Of course, there are security concerns about the data you're downloading, too; the images can easily contain executable code (such as PHP) in their EXIF data, so you need to be careful about what you do with the images and how.