2

Question: How toi check either with casperjs or by javascript (afterwards) if my file download was successful?

I crawl some blogs and download the images, unfortunately it does not always download the image.

The casperjs script is run locally from my computer. I save all the filenames of the downloaded (or failed downloading) files into a json. The function itself does not give any information if it was performed successful: http://casperjs.readthedocs.org/en/latest/modules/casper.html#download

FAILED APPROACH 1:

function UrlExists(url)
{
  var http = new XMLHttpRequest();
  http.open('HEAD', url, false);
  http.send();
  return http.status!=404;
}

How do I check if file exists in jQuery or JavaScript? but I guess this works only on a server, it throws the error:

NETWORK_ERR: XMLHttpRequest Exception 101: A network error occured in synchronous requests.

How to check if file exists locally in JavaScript?

FAILED APPROACH 2:

I found also the following approach which does not work in casperjs or at least it doesn't display anything.

function checkImage (src) {
  console.log("check");
  var img = new Image();
  img.onload = function(){console.log("yes");};
  img.onerror = function(){console.log("no");};
  img. src = src;
}

FAILED APPROACH 3:

The last approach gives me false, I guess that is also because the javascript is in sandbox:

function ImageExist(url)
{
   var img = new Image();
   img.src = '/Users/MasterG/Desktop/PROJEKTE/paleo-crawler/' + site + '/'+url;
   console.log(img.src," - " ,img.height);
   return img.height != 0;
}

Go to local URL with Javascript

Community
  • 1
  • 1
Andi Giga
  • 3,744
  • 9
  • 38
  • 68
  • All three don't work. I thought I put my research in as suggested by overflow. I wanted to avoid people posting/trying solutions which didn't work. Also maybe I did not see something e.g. like file permissions in these solutions. – Andi Giga Mar 17 '15 at 14:59
  • I think you should make it bold that all of those solutions do not work. Please describe how you generally see whether the image was not loaded. 1. Does the image file exist? 2. If it does, is it empty? 3. If it has content, is the image data corrupt? Furthermore, does this change if you run your script again or is it always the same image? – Artjom B. Mar 17 '15 at 15:35
  • The missing images come only from urls which have been on the owners old domain, so there are redirects but casperjs can't download cross domain. In that case I make a snapshot. But there are still some weird https domains from amazon etc. It would be safe feeling if I can check automatically at the end if the images are all downloaded. Maybe one day the connction aborts etc. (I could always check for plausability myself but thats not the purpose of automatisation). – Andi Giga Mar 18 '15 at 13:23

0 Answers0