12

An installation process is downloading a .tar.gz archive, then extract the files to a destination directory. However, not all the files in the archive are required, and I'd like to specify which files should be extracted. The naïve way would be to delete the unnecessary files after extraction, but I'd like a "cleaner" way and filter out instead.

Is this possible?

The (relevant) code I have so far is (stripped for readability)

var fs = require('fs');
var tar = require('tar');
var zlib = require('zlib');

var log = console.log;

var tarball = 'path/to/downloaded/archive.tar.gz';
var dest = 'path/to/destination';

fs.createReadStream(tarball)
  .on("error", log)
  .pipe(zlib.Unzip())
  .pipe(tar.Extract({ path: dest }))
  .on("end", log);

Thank you.

Yanick Rochon
  • 51,409
  • 25
  • 133
  • 214

2 Answers2

14

It works similar to the unzip module:

var fs = require('fs');
var tar = require('tar');
var zlib = require('zlib');
var path = require('path');
var mkdirp = require('mkdirp'); // used to create directory tree

var log = console.log;

var tarball = 'path/to/downloaded/archive.tar.gz';
var dest    = 'path/to/destination';

fs.createReadStream(tarball)
  .on('error', log)
  .pipe(zlib.Unzip())
  .pipe(tar.Parse())
  .on('entry', function(entry) {
    if (/\.js$/.test(entry.path)) { // only extract JS files, for instance
      var isDir     = 'Directory' === entry.type;
      var fullpath  = path.join(dest, entry.path);
      var directory = isDir ? fullpath : path.dirname(fullpath);

      mkdirp(directory, function(err) {
        if (err) throw err;
        if (! isDir) { // should really make this an `if (isFile)` check...
          entry.pipe(fs.createWriteStream(fullpath));
        }
      });
    }
  });
robertklep
  • 198,204
  • 35
  • 394
  • 381
  • @rynop good catch, although I would perform that check before calling `mkdirp()`. – robertklep Apr 01 '15 at 18:52
  • lol ya duh. fixed. Also I'm seeing the files within the .tar.gz get corrupted on disk after extraction. You ever seen this before? Files get onto disk with the correct name and structure, but jar's inside the file are corrupted. Omitting the extract only JS files line and using tar.Extract() this does not happen – rynop Apr 01 '15 at 19:20
  • @rynop I've used similar code to extract tar files, and it has always worked for me. If you have a sample tar file for me, I'd be happy to take a look. – robertklep Apr 01 '15 at 20:26
  • @robertklep I appreciate it: http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_2015-01-27.tar.gz You can see my code that is working at https://github.com/doapp-ryanp/dynamodb-local/blob/master/index.js#L105 - if i change to `tar.Parse()` and use your logic the binary files get corrupted (different size) – rynop Apr 02 '15 at 16:16
  • @rynop this seems to work just fine for me: https://gist.github.com/robertklep/01dd2c64a0fe9f5483d5 (feel free to leave comments there instead of here :-) – robertklep Apr 02 '15 at 17:22
  • if rynop is on windows and robertklep on *nix like system, then this argument on corrupted file stands. There has to be normalization performed for file paths. \\ win, / nix. – bigkahunaburger Mar 19 '17 at 00:07
0

You can take a look at this post to find a good solution.

By the way, in the zlib-documentation you'll see that you can specify a "buffer" calling .unzip().

Community
  • 1
  • 1
Luca Davanzo
  • 21,000
  • 15
  • 120
  • 146
  • No, the archive can be quite large and I don't want to allocate that much RAM. The memory footprint must be at it's minimum. Besides, the archives contains a directory structure and what you are proposing does not apply. – Yanick Rochon Feb 24 '14 at 13:55
  • I have to read an extracted file from the tar. When should I call my read function? I tried onClose but my file is not fully written till then. You can see my code. – Rajeev Raina Feb 06 '18 at 10:43
  • fs.createReadStream(filename) .pipe(zlib.Unzip()) .pipe(new tar.Parse()) .on('entry', function(entry) { { var isDir = 'Directory' === entry.type; var fullpath = path.join(dest, entry.path); var directory = isDir ? fullpath : path.dirname(fullpath); mkdirp(directory, function(err) { if (err) throw err; if (! isDir) { entry.pipe(fs.createWriteStream(fullpath) .on('error', function(e){alert('Error');}) ); }});} }).on('close', function(){setTimeout(readXMLFile(sysObject.path + '\\layout_new.xml'),0);}) – Rajeev Raina Feb 06 '18 at 10:44