0

I am trying to build a scraper to download video streams and and save them in a private cloud instance using NightMareJs (http://www.nightmarejs.org/)

I have seen the documentation and it shows how to download simple files like this -

.evaluate(function ev(){
    var el = document.querySelector("[href*='nrc_20141124.epub']");
    var xhr = new XMLHttpRequest();
    xhr.open("GET", el.href, false);
    xhr.overrideMimeType("text/plain; charset=x-user-defined");
    xhr.send();
    return xhr.responseText;
}, function cb(data){
    var fs = require("fs");
    fs.writeFileSync("book.epub", data, "binary");
})

-- based on the SO post here -> Download a file using Nightmare

But I want to download video streams using NodeJs async streams api. Is there a way to open a stream from a remote url and pipe it to local / other remote writable stream using NodeJs inbuilt stream apis

Community
  • 1
  • 1
Harshit Laddha
  • 2,044
  • 8
  • 34
  • 64

1 Answers1

0

You can check if the server sends the "Accept-Ranges" (14.5) and "Content-Length" (14.13) headers through a HEAD request to that file, then request smaller chunks of the file you're trying to download using the "Content-Range" (14.16) header and write each chunk to the target file (you can use appending mode in order to reduce management of the file stream).

Of course, this will be quite slow if you're requesting very small chunks sequentially. You could build a pool of requestors (e.g. 4) and only write the next correct chunk to the file (so the other requestors would not take on future chunks if they are already done downloading).

Artjom B.
  • 61,146
  • 24
  • 125
  • 222