5

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

Here is my code for that

var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();

casper.start('http://www.example.com/page_with_download_button', function() {

});

casper.then(function() {    
     this.click('#download_button');
 });

 casper.on('resource.received', function (resource) {
     "use strict";
    for(i=0;i < resource.headers.length; i++){
        if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
            cache.includeResource(resource);
        }
    }
 });

 casper.on('load.finished', function(status) {
    for(i=0; i< cache.cachedResources.length; i++){
        var file = cache.cachedResources[i].cacheFileNoPath;
        var ext = mimetype.ext[cache.cachedResources[index].mimetype];
        var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
        fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
    }
});

casper.run();

I think the problem could be caused by my cachePath being incorrect in cache.js

exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';

Should I be using something in adition to the backslashes to define the path?

When I try

 casperjs --disk-cache=true export_script.js

Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

I would also be open to solutions outside of phantomjs/casperjs.


UPDATE

I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

document.getElementById("download_button").click();

Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"

I set that batch script to run nightly using the windows task scheduler.

Success!

Community
  • 1
  • 1
user
  • 61
  • 1
  • 2
  • 10
  • 1
    tampermonkey is a simple way to click a button when a certain page is visited. window's scheduled tasks is a good way to open a url on a schedule: just run the url or be specific and run eg. `chrome.exe 'http://example.com'` or something. you can run a shortcut file too... anyway, the browser opens to the page, then tampermonkey clicks the button. use it all the time on a seldom-used desktop. – dandavis Mar 16 '16 at 20:13
  • Use selenium wedriver, see this link http://yizeng.me/2014/05/23/download-pdf-files-automatically-in-firefox-using-selenium-webdriver/ – NullPointerException Mar 16 '16 at 20:15
  • thanks! I had never heard of tampermonkey. – user Mar 16 '16 at 20:21
  • You probably should move your event handlers *before* `casper.start()` – Artjom B. Mar 16 '16 at 20:28
  • Backslashes need to be escaped! `exports.cachePath = 'C:\\Users\\username\\AppData\\Local\\Ofi Labs\\PhantomJS';`, but windows also supports forward slashes `exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';` – Artjom B. Mar 16 '16 at 20:31
  • I tried both the forward slashes and the double backslashes and neither worked. I will update the question to include forward slashes. – user Mar 16 '16 at 20:34

1 Answers1

5

Your button most likely issues a POST request to the server. In order to track it:

  1. Open Network tab in Chrome developer tools
  2. Navigate to the page and hit the button.
  3. Notice which request led to file download. Right click on it and copy as cURL
  4. Run copied cURL

Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.

vicneanschi
  • 468
  • 1
  • 4
  • 13
  • The button does issue a POST request. The url in the dev tools network tab is the same as the page url. I tried a php curl solution first, but I could never get the file returned. It always returned an error message. The web page uses some kind of validation to check that the button was actually clicked. – user Mar 16 '16 at 21:18
  • 1
    Could you update your question with the error(s) you get when using the pure HTTP approach? It may be you simply need to add a cookie, and can ditch the *JS tools. – Andrew Regan Mar 16 '16 at 23:20