5

I am able to achieve recursive file traversal in a directory (i.e to explore all the subdirectories and files in a directory). For that I have used an answer from a respective post on stack overflow. The snippet of that is below:

var fs = require("fs");

var tree = function(dir, done) {
  var results = {
        "path": dir,
        "children": []
      };
  fs.readdir(dir, function(err, list) {
    if (err) { return done(err); }
    var pending = list.length;
    if (!pending) { return done(null, results); }
    list.forEach(function(file) {
      fs.stat(dir + '/' + file, function(err, stat) {
        if (stat && stat.isDirectory()) {
          tree(dir + '/' + file, function(err, res) {
            results.children.push(res);
            if (!--pending){ done(null, results); }
          });
        } else {
          results.children.push({"path": dir + "/" + file});
          if (!--pending) { done(null, results); }
        }
      });
    });
  });
};

module.exports = tree;

When I run:

 tree(someDirectoryPath, function(err, results) {
        if (err) throw err;

        console.log(results);
      });

I get a sample result, such as this one:

{ path: '/Users/UserName/Desktop/1',
  children: 
   [ { path: '/Users/UserName/Desktop/1/file1' },
     { path: '/Users/UserName/Desktop/1/file2' },
     { path: '/Users/UserName/Desktop/1/file3' },
     { path: '/Users/UserName/Desktop/1/subdir1',
       children: [Object] } ] }

I am also able to hash a single file in a specific location, by using the fs' module ReadStream method. The snippet for that is below:

/**
 * Checking File Integrity
 */
var fs = require('fs'),
      args = process.argv.splice('2'),
      path = require('path'),
      traverse = require('/Users/UserName/Desktop/tree.js'),
      crypto = require('crypto');
//var algorithm = ['md5', 'sha1', 'sha256', 'sha512'];
var algorithm = 'sha512';
var hashTable = new Array();

        var hash = crypto.createHash(algorithm);

        var fileStream = fs.ReadStream(args[0]);

        fileStream.on('data', function(data) {
                hash.update(data);
        fileStream.on('end', function() {
                var digest = hash.digest('hex');
                console.log('algorithm used: ', algorithm);
                console.log('hash for the file: ',digest);
                hashTable[args[0]] = digest;
                console.log(hashTable);
        });
});

Where args[0] stores the location of the file to be read by the ReadStream. After hashing of a specific file, the console log returned is as follows:

node fileIntegrityChecker.js hello.txt
algorithm used:  sha512
hash for the file:  9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043
the hashtable is: [ 'hello.txt': '9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043' ]

My problem is that I tried to somehow integrate the tree module functionality in the hash related js file. My idea is that the program will capture the user's input, as a path to a directory and that input will be processed to traverse the whole subdirectories and files of a folder. Also, the fileStream.on method should be included in the callback from the tree module. However I am not fully initiated in the callback mechanism and I hope to get some insight from you.

This is what I've tried

/**
 * Checking File Integrity
 */
var fs = require('fs'),
      args = process.argv.splice('2'),
      path = require('path'),
      tree = require('/Users/UserName/Desktop/tree.js'),
      crypto = require('crypto');
//var algorithm = ['md5', 'sha1', 'sha256', 'sha512'];
var algorithm = 'sha512';
var hashTable = new Array();

        var pathString = 'Users/UserName/Desktop/1';
        tree(pathString, function(err, results) {
            if (err) throw err;

            var hash = crypto.createHash(algorithm);
            var fileStream = fs.ReadStream(results.children[1]['path']);
             fileStream.on('data', function(data) {
                hash.update(data);
             fileStream.on('end', function() {
                var digest = hash.digest('hex');
                console.log('algorithm used: ', algorithm);
                console.log('hash for the file: ',digest);
                hashTable[results.children[1]['path']] = digest;
                console.log('The hashtable is: ', hashTable);
                });
            });
        });

Now, I've made some progress in the sense that I don't receive an error. Basically I achieved my scope. However I am able to extract only one result explicitly. For some reason, I cannot think how to iteratively (for instance) get each child of the result JSON object. If that is solved, I think the problem will be completely solved.

Can you please show me a way how to successfully combine the module and the js file to recursively traverse all the contents of a directory and create a hash for every file in it. I need this to ultimately check if some changes in the files occurred, based on their hashes. Thank you!

Community
  • 1
  • 1
v01d
  • 327
  • 3
  • 11
  • 1
    callback looks fine. EISDIR means you are trying to do an operation on a directory when a different filetype is expected. Have you traced which line throws the error? – chriskelly Sep 27 '15 at 17:46
  • The line tree(someDirectoryPath, function(err, results) { gives the error. So, this means that the problem may be when I pass the variable someDirectory to the tree( ) function. – v01d Sep 27 '15 at 17:55
  • @chriskelly I have made some changes, can you please check them? – v01d Sep 27 '15 at 18:37
  • checked. let me know if my answer is clear – chriskelly Sep 27 '15 at 19:33
  • Give me some minutes please. I am trying to update my code. I'll let you know if something is not clear. Thanks. – v01d Sep 27 '15 at 19:39
  • Do you really need recursion? It seems like globbing a file stream and piping them through your hashing function would be sufficient. – Josh C. Sep 27 '15 at 20:17
  • @JoshC. Yeah, I need recursion. Although, can you provide a sample code of how to elaborate on your possible solution? – v01d Sep 27 '15 at 20:41
  • @v01d I offered an answer. – Josh C. Sep 28 '15 at 14:14

3 Answers3

2

The simplest thing to do would be to generate the hash while you are already walking the directory tree. This involves updating the tree.js file as follows:

    } else {
      var fname = dir + "/" + file};
      // put your hash generation here
      generateHash(fname, function (e, hash) {
        if (e) done(e);

        results.children.push({"path": fname, "hash" : hash);
        if (!--pending) { 
          done(null, results); 
        }
      });
    }

Then put your hash generation code in a function like this:

function generateHash (filename, callback) {
    var algorithm = 'sha512';
    var hashTable = new Array();

    var hash = crypto.createHash(algorithm);
    var fileStream = fs.ReadStream(filename);

    fileStream.on('data', function(data) {
        hash.update(data);      
    });
    fileStream.on('end', function() {
        var digest = hash.digest('hex');
        callback(null, digest);
    });
}
chriskelly
  • 7,526
  • 3
  • 32
  • 50
  • When I try to run your code, the following error appears: binding.open(pathModule._makeLong(path), ^ TypeError: path must be a string. The console throws the line var fileStream = fs.ReadStream(filename); – v01d Sep 27 '15 at 20:07
  • Apparently I get some memory leaks. (node) warning: possible EventEmitter memory leak detected. 11 end listeners added. Use emitter.setMaxListeners() to increase limit. crypto.js:126 return this._handle.digest(outputEncoding); ^ Error: Not initialized at Error (native) at Hash.digest (crypto.js:126:23) at ReadStream. (/Users/MacriniciDan/Desktop/tree2.js:17:31) at ReadStream.emit (events.js:129:20) at _stream_readable.js:908:16 at process._tickCallback (node.js:355:11) – v01d Sep 27 '15 at 20:18
  • 1
    Thank you! It works now. Thank you for all of your effort and solution! – v01d Sep 27 '15 at 20:45
  • Interesting link between the callback(null, digest) line and function(e, hash). I think I understand callbacks better now. Thanks once more! – v01d Sep 27 '15 at 20:49
1
import crypto from 'crypto';
import fs from 'fs';
import path from 'path';

// walk dir recursively
function* walkSync(dir: string) {
  const files = fs.readdirSync(dir, { withFileTypes: true });
  for (const file of files) {
    if (file.isDirectory()) {
      yield* walkSync(path.join(dir, file.name));
    } else {
      yield path.join(dir, file.name);
    }
  }
}

// concat all files hashes and hash the hashes
function dirHash(dir: string) {
  const hexes = [];
  for (const file of walkSync(dir)) {
    const buffer = fs.readFileSync(file);
    const hash = crypto.createHash('sha256');
    hash.update(buffer);
    const hex = hash.digest('hex');
    hexes.push(hex);
  }
  return crypto.createHash('sha256').update(hexes.join('')).digest('hex');
}


console.log(dirHash('./src'));
0

Using vinyl-fs, you could glob a directory. This will probably cut down on your code quite a bit.

Then you would pipe the files through a handler that would generate your hash.

Here's an example:

fs.src(['./**/*.js'])
  .pipe(hasher)
  .pipe(concater)
  .dest('output.file')
Josh C.
  • 4,303
  • 5
  • 30
  • 51