download csv (or other non html data) with phantomjs

Question

How can I access simple csv data?

    var webpage = require('webpage');
    var csvPage = webpage.create();
    var csvUrl= "http://www.scoach.ch/arcmsdownload/023c5c5aa58e6e0ff963ddcdea5ac016/CONTENT.csv/derivatives_2013-05-24.csv";

    csvPage.open(csvUrl, function(status){
      console.log("csv: " + csvPage.content);
    });

This will give me just an empty html: which is not the expected result :-) I have tried several callbacks, but nothing helped me.

Thanks for your Help!

Did you ever figure out the answer to this? – Justin Bicknell Jan 05 '14 at 21:41 — Justin Bicknell, Jan 05 '14 at 21:41

Darren Cook · Accepted Answer · 2013-05-28T01:25:12.670

First, I'll just quickly point out that PhantomJS is overkill for this job. Use wget, curl, PHP file_get_contents, etc. However, I'm assuming this is part of a more complicated PhantomJS script, and you have a good reason.

I can only half answer your question, by showing you how to see the missing error messages:

var webpage = require('webpage');
var csvPage = webpage.create();
var csvUrl= "http://www.scoach.ch/arcmsdownload/023c5c5aa58e6e0ff963ddcdea5ac016/CONTENT.csv/derivatives_2013-05-24.csv";
csvPage.open(csvUrl, function(status){
  console.log("status="+status);
  console.log("csv: " + csvPage.plainText);
  phantom.exit();
});

I made these changes:

Show the status (it is "fail")
Change to use plainText instead of content. (The latter wraps your content in html tags, which you don't want for csv).
Add phantom.exit(), just so it doesn't sit there at the end.

I don't know why the status is "fail", when I can get the file fine with wget. The next troubleshooting step is to add these two lines before calling csvPage.open:

csvPage.onResourceRequested = function (request) {
    console.log('Request ' + JSON.stringify(request, undefined, 4));
};
csvPage.onResourceReceived = function (response) {
    console.log('Receive ' + JSON.stringify(response, undefined, 4));
};

It is returning immediately, with 3878 bytes, even though I see a Content-Length header of 6,335,428. This might be a PhantomJS bug/limitation with either chunked encoding or very large files.

UPDATE: Another idea, for a short-term solution, is to call wget or curl from inside your PhantomJS script, using the new spawn or execFile commands: http://code.google.com/p/phantomjs/source/browse/examples/child_process-examples.js

Thanks! I have sumitted a bug report based on this question. — KIC, May 28 '13 at 08:30

score 0 · Answer 2 · edited May 23 '17 at 12:24

0

This SO post might help. Also note that PhantomJS is a separate web server from NodeJS, so using csv node libraries isn't an option.

edited May 23 '17 at 12:24

Community

1
1

answered Mar 12 '15 at 18:52

grokpot

1,462
20
26

download csv (or other non html data) with phantomjs

2 Answers2