30

I'm using the Request module to download files, but I'm not quite sure how to pipe the response to an output stream when the filename must come from the 'Content-Disposition' header. So basically, I need to read the response until the header is found, and then pipe the rest to that filename.

The examples show something like:

request('http://google.com/doodle.png').pipe(fs.createWriteStream('doodle.png'));

Where I want to do (pseudocode):

var req = request('http://example.com/download_latest_version?token=XXX');
var filename = req.response.headers['Content-Disposition'];

req.pipe(fs.createWriteStream(filename));

I could get the filename using the Request callback:

request(url, function(err, res, body) {
 // get res headers here
});

But wouldn't that negate the benefits of using pipe and not loading the downloaded file into memory?

user3019326
  • 315
  • 1
  • 3
  • 7

3 Answers3

32

I'm reqesting a image from yahoo and it isn't using the content-disposition header but I am extracting the date and content-type headers to construct a filename. This seems close enough to what you're trying to do...

var request = require('request'),
fs = require('fs');

var url2 = 'http://l4.yimg.com/nn/fp/rsz/112113/images/smush/aaroncarter_635x250_1385060042.jpg';

var r = request(url2);

r.on('response',  function (res) {
  res.pipe(fs.createWriteStream('./' + res.headers.date + '.' + res.headers['content-type'].split('/')[1]));

});

Ignore my image choice please :)

kberg
  • 2,059
  • 1
  • 19
  • 11
13

Question has been around a while, but I today faced the same problem and solved it differently:

var Request = require( 'request' ),
    Fs = require( 'fs' );

// RegExp to extract the filename from Content-Disposition
var regexp = /filename=\"(.*)\"/gi;

// initiate the download
var req = Request.get( 'url.to/somewhere' )
                 .on( 'response', function( res ){

                    // extract filename
                    var filename = regexp.exec( res.headers['content-disposition'] )[1];

                    // create file write stream
                    var fws = Fs.createWriteStream( '/some/path/' + filename );

                    // setup piping
                    res.pipe( fws );

                    res.on( 'end', function(){
                      // go on with processing
                    });
                 });
sbugert
  • 345
  • 3
  • 5
Sirko
  • 72,589
  • 19
  • 149
  • 183
  • 1
    `res.on( 'end'` seems to fire to early, I think you should use `Request.get(..).on('response', ..).on('finish'`. See http://stackoverflow.com/a/11448311/138023 – Markus Hedlund Jul 01 '16 at 08:44
4

Here's my solution:

var fs = require('fs');
var request = require('request');
var through2 = require('through2');

var req = request(url);
req.on('error', function (e) {
    // Handle connection errors
    console.log(e);
});
var bufferedResponse = req.pipe(through2(function (chunk, enc, callback) {
    this.push(chunk);
    callback()
}));
req.on('response', function (res) {
    if (res.statusCode === 200) {
        try {
            var contentDisposition = res.headers['content-disposition'];
            var match = contentDisposition && contentDisposition.match(/(filename=|filename\*='')(.*)$/);
            var filename = match && match[2] || 'default-filename.out';
            var dest = fs.createWriteStream(filename);
            dest.on('error', function (e) {
                // Handle write errors
                console.log(e);
            });
            dest.on('finish', function () {
                // The file has been downloaded
                console.log('Downloaded ' + filename);
            });
            bufferedResponse.pipe(dest);
        } catch (e) {
            // Handle request errors
            console.log(e);
        }
    }
    else {
        // Handle HTTP server errors
        console.log(res.statusCode);
    }
});

The other solutions posted here use res.pipe, which can fail if the content is transferred using gzip encoding, because the response stream contains the raw (compressed) HTTP data. To avoid this problem you have to use request.pipe instead. (See the second example at https://github.com/request/request#examples.)

When using request.pipe I was getting an error: "You cannot pipe after data has been emitted from the response.", because I was doing some async stuff before actually piping (creating a directory to hold the downloaded file). I also had some problems where the file was being written with no content, which might have been due to request reading the HTTP response and buffering it.

So I ended up creating an intermediate buffering stream with through2, so that I could pipe the request to it before the response handler fires, then later piping from the buffering stream into the file stream once the filename is known.

Finally, I'm parsing the content disposition header whether the filename is encoded in plain form or in UTF-8 form using the filename*=''file.txt syntax.

I hope this helps someone else who experiences the same issues that I had.

chris
  • 1,638
  • 2
  • 15
  • 17