11

Is there any chance to copy large files with Node.js with progress infos and fast?

Solution 1 : fs.createReadStream().pipe(...) = useless, up to 5 slower than native cp

See: Fastest way to copy file in node.js, progress information is possible (with npm package 'progress-stream' ):

fs = require('fs');
     fs.createReadStream('test.log').pipe(fs.createWriteStream('newLog.log')); 

The only problem with that way is that it takes easily 5 times longer compared "cp source dest". See also the appendix below for the full test code.

Solution 2 : rsync ---info=progress2 = same slow as solution 1 = useless

Solution 3 : My last resort, write a native module for node.js, using "CoreUtils" (linux sources for cp and others) or other functions as shown in Fast file copy with progress

Does anyone knows better than solution 3? I'd like to avoid native code but it seems the best fit.

thanks! any package recommendations or hints (tried all fs**) are welcome!

Appendix:

test code, using pipe and progress:

var path = require('path');
var progress = require('progress-stream');
var fs = require('fs');
var _source = path.resolve('../inc/big.avi');// 1.5GB
var _target= '/tmp/a.avi';

var stat = fs.statSync(_source);
var str = progress({
    length: stat.size,
    time: 100
});

str.on('progress', function(progress) {
    console.log(progress.percentage);
});

function copyFile(source, target, cb) {
    var cbCalled = false;


    var rd = fs.createReadStream(source);
    rd.on("error", function(err) {
        done(err);
    });

    var wr = fs.createWriteStream(target);

    wr.on("error", function(err) {
        done(err);
    });

    wr.on("close", function(ex) {
        done();
    });

    rd.pipe(str).pipe(wr);

    function done(err) {
        if (!cbCalled) {
            console.log('done');
            cb && cb(err);
            cbCalled = true;
        }
    }
}
copyFile(_source,_target);

update: a fast (with detailed progress!) C version is implemented here: https://github.com/MidnightCommander/mc/blob/master/src/filemanager/file.c#L1480. Seems the best place to go from :-)

Community
  • 1
  • 1
xamiro
  • 1,391
  • 1
  • 16
  • 32
  • Have you tried either grunt (using grunt-contrib-copy) or just a simple `require('child_process').exec('cp source dest');`? – jperezov Dec 07 '15 at 20:29
  • What exactly is your goal here? Node will probably never be as fast as native tools like `cp`, so you must have a specific reason why you want to implement it like this? – robertklep Dec 07 '15 at 21:48
  • @jperezov yes, its doing as all the others too. the progress is nicer, yeah! – xamiro Dec 07 '15 at 22:16
  • @robertklep speed! nothing else, i remember even faster ASM cp versions, just can't find them back... – xamiro Dec 07 '15 at 22:17

6 Answers6

7

One aspect that may slow down the process is related to console.log. Take a look into this code:

const fs = require('fs');
const sourceFile = 'large.exe'
const destFile = 'large_copy.exe'

console.time('copying')
fs.stat(sourceFile, function(err, stat){
  const filesize = stat.size
  let bytesCopied = 0

  const readStream = fs.createReadStream(sourceFile)

  readStream.on('data', function(buffer){
    bytesCopied+= buffer.length
    let porcentage = ((bytesCopied/filesize)*100).toFixed(2)
    console.log(porcentage+'%') // run once with this and later with this line commented
  })
  readStream.on('end', function(){
    console.timeEnd('copying')
  })
  readStream.pipe(fs.createWriteStream(destFile));
})

Here are the execution times copying a 400mb file:

with console.log: 692.950ms

without console.log: 382.540ms

Tulio Faria
  • 874
  • 7
  • 16
4

cpy and cp-file both support progress reporting

Yury Solovyov
  • 526
  • 8
  • 18
1

I have the same issue. I want to copy large files as fast as possible and want progress information. I created a test utility that tests the different copy methods:

https://www.npmjs.com/package/copy-speed-test

You can run it simply with:

npx copy-speed-test --source someFile.zip --destination someNonExistentFolder

It does a native copy using child_process.exec(), a copy file using fs.copyFile and it uses createReadStream with a variety of different buffer sizes (you can change buffer sizes by passing them on the command line. run npx copy-speed-test -h for more info.

Some things I learnt:

  • fs.copyFile is just as fast as native
  • you can get quite inconsistent results on all these methods, particularly when copying from and to the same disc and with SSDs
  • if using a large buffer then createReadStream is nearly as good as the other methods
  • if you use a very large buffer then the progress is not very accurate.

The last point is because the progress is based on the read stream, not the write stream. if copying a 1.5GB file and your buffer is 1GB then the progress immediately jumps to 66% then jumps to 100% and you then have to wait whilst the write stream finishes writing. I don't think that you can display the progress of the write stream.

If you have the same issue I would recommend that you run these tests with similar file sizes to what you will be dealing with and across similar media. My end use case is copying a file from an SD card plugged into a raspberry pi and copied across a network to a NAS so that's what I was the scenario that I ran the tests for.

I hope someone other than me finds it useful!

Roaders
  • 4,373
  • 8
  • 50
  • 71
0

I solved a similar problem (using Node v8 or v10) by changing the buffer size. I think the default buffer size is around 16kb, which fills and empties quickly but requires a full cycle around the event loop for each operation. I changed the buffer to 1MB and writing a 2GB image fell from taking around 30 minutes to 5, which sounds similar to what you are seeing. My image was also decompressed on the fly, which possibly exacerbated the problem. Documentation on stream buffering has been in the manual since at least Node v6: https://nodejs.org/api/stream.html#stream_buffering

Here are the key code components you can use:

let gzSize = 1;     // do not initialize divisors to 0
const hwm = { highWaterMark: 1024 * 1024 }
const inStream = fs.createReadStream( filepath, hwm );

// Capture the filesize for showing percentages
inStream.on( 'open', function fileOpen( fdin ) {
    inStream.pause();         // wait for fstat before starting
    fs.fstat( fdin, function( err, stats ) {
        gzSize = stats.size;
        // openTargetDevice does a complicated fopen() for the output.
        // This could simply be inStream.resume()
        openTargetDevice( gzSize, targetDeviceOpened );
    });
});

inStream.on( 'data', function shaData( data ) {
    const bytesRead = data.length;
    offset += bytesRead;
    console.log( `Read ${offset} of ${gzSize} bytes, ${Math.floor( offset * 100 / gzSize )}% ...` );
    // Write to the output file, etc.
});

// Once the target is open, I convert the fd to a stream and resume the input.
// For the purpose of example, note only that the output has the same buffer size.
function targetDeviceOpened( error, fd, device ) {
    if( error ) return exitOnError( error );

    const writeOpts = Object.assign( { fd }, hwm );
    outStream = fs.createWriteStream( undefined, writeOpts );
    outStream.on( 'open', function fileOpen( fdin ) {
        // In a simpler structure, this is in the fstat() callback.
        inStream.resume();    // we have the _input_ size, resume read
    });

    // [...]
}

I have not made any attempt to optimize these further; the result is similar to what I get on the commandline using 'dd' which is my benchmark.

I left in converting a file descriptor to a stream and using the pause/resume logic so you can see how these might be useful in more complicated situations than the simple fs.statSync() in your original post. Otherwise, this is simply adding the highWaterMark option to Tulio's answer.

rand'Chris
  • 788
  • 4
  • 17
0

Here is what I'm trying to use now, it copies 1 file with progress:

String.prototype.toHHMMSS = function () {
    var sec_num = parseInt(this, 10); // don't forget the second param
    var hours   = Math.floor(sec_num / 3600);
    var minutes = Math.floor((sec_num - (hours * 3600)) / 60);
    var seconds = sec_num - (hours * 3600) - (minutes * 60);

    if (hours   < 10) {hours   = "0"+hours;}
    if (minutes < 10) {minutes = "0"+minutes;}
    if (seconds < 10) {seconds = "0"+seconds;}
    return hours+':'+minutes+':'+seconds;
}

var purefile="20200811140938_0002.MP4";
var filename="/sourceDir"+purefile;
var output="/destinationDir"+purefile;

var progress = require('progress-stream');
var fs = require('fs');

const convertBytes = function(bytes) {
  const sizes = ["Bytes", "KB", "MB", "GB", "TB"]

  if (bytes == 0) {
    return "n/a"
  }

  const i = parseInt(Math.floor(Math.log(bytes) / Math.log(1024)))

  if (i == 0) {
    return bytes + " " + sizes[i]
  }

  return (bytes / Math.pow(1024, i)).toFixed(1) + " " + sizes[i]
}
 
var copiedFileSize = fs.statSync(filename).size;
var str = progress({
    length: copiedFileSize, // length(integer) - If you already know the length of the stream, then you can set it. Defaults to 0.
    time: 200, // time(integer) - Sets how often progress events are emitted in ms. If omitted then the default is to do so every time a chunk is received.
        speed: 1, // speed(integer) - Sets how long the speedometer needs to calculate the speed. Defaults to 5 sec.
//      drain: true // drain(boolean) - In case you don't want to include a readstream after progress-stream, set to true to drain automatically. Defaults to false.
//      transferred: false// transferred(integer) - If you want to set the size of previously downloaded data. Useful for a resumed download.
});

     /*
    {
        percentage: 9.05,
        transferred: 949624,
        length: 10485760,
        remaining: 9536136,
        eta: 42,
        runtime: 3,
        delta: 295396,
        speed: 949624
    }
    */

str.on('progress', function(progress) {
    console.log(progress.percentage+'%');
        console.log('eltelt: '+progress.runtime.toString().toHHMMSS() + 's / hátra: ' + progress.eta.toString().toHHMMSS()+'s');
        console.log(convertBytes(progress.speed)+"/s"+' '+progress.speed);
 
});

//const hwm = { highWaterMark: 1024 * 1024 } ;
var hrstart = process.hrtime(); // measure the copy time
var rs=fs.createReadStream(filename)
    .pipe(str)
    .pipe(fs.createWriteStream(output, {emitClose: true}).on("close", () => {
        var hrend = process.hrtime(hrstart);

            var timeInMs = (hrend[0]* 1000000000 + hrend[1]) / 1000000000;
            var finalSpeed=convertBytes(copiedFileSize/timeInMs);

       console.log('Done: file copy: '+ finalSpeed+"/s");
        console.info('Execution time (hr): %ds %dms', hrend[0], hrend[1] / 1000000);
}) );
HyGy
  • 31
  • 1
  • 6
0

Refer to https://www.npmjs.com/package/fsprogress.

With that package, you can track progress while you are copying or moving files. The progress tracking is event and method call based so its very convenient to use.

You can provide options to do a lot of things. eg. total number of file for concurrent operation, chunk size to read from a file at a time. It was tested for single file upto 17GB and directories up to i dont really remember but it was pretty large. And also :D, it is safe to use for large file(s).

So, go ahead and have a look at it whether it matches your expectations or if it is what you are looking for :D

Bhuwan pandey
  • 53
  • 1
  • 2
  • 11