1

I'm currently using CasperJS (on top of the headless browser PhantomJS) for site scraping and I would like to download images from a website.

There are two approaches for this, both of which are well documented, but neither of them suits my purposes.

I could use casper.capture() to take a screenshot of a portion of the site, but the image is obscured by HTML elements displayed in front of it, so that's not an option - I need the original source of the image.

Of course, there is always casper.download(), which actually does work, but this only works when I run casperjs with --web-security=no, which presents a security risk, considering I'm scraping a site that isn't my own.

It also appears that casper.on("resource.received", fuction(resource){}) doesn't suit my needs, considering that only gives me the image metadata, rather than the image itself.

I have tried to use the cache system as explained here, but that didn't work for me. Whenever I try to access cache.cachedResources[index].getContents(), my casperjs crashes due to an unknown reason. Using a proxy is not a viable solution either.

If anyone knows of a way to download the original image without disabling web security, that would be most appreciated. Keep in mind that I don't necessarily need it saved to a file, if I can access the byte content in CasperJS, then that's also fine.

Thank you!

Community
  • 1
  • 1
Insdeath
  • 93
  • 5
  • You can paint an image to a hidden canvas and get its DataURL, but PhantomJS is broken, because the image is not compressed. I don't know if it was fixed in version 2. Here is the way you would that inside of the page context: [Get image data in JavaScript?](http://stackoverflow.com/questions/934012/get-image-data-in-javascript) – Artjom B. Jun 24 '15 at 17:52
  • I've tried that, but that doesn't appear to work either. Could it be because all images are hosted on another domain than the website itself? – Insdeath Jun 30 '15 at 11:55
  • If you use `--web-security=no` then this shouldn't be a problem. Since you already tried something, please register to the `resource.error`, `page.error`, `remote.message` and `casper.page.onResourceTimeout` events ([Example](https://gist.github.com/artjomb/4cf43d16ce50d8674fdf)). Maybe there are errors. – Artjom B. Jun 30 '15 at 17:17

0 Answers0