0

I'm just learning javascript, and a common task I perform when picking up a new language is to write a hex-dump program. The requirements are 1. read file supplied on command line, 2. be able to read huge files (reading a buffer-at-a-time), 3. output the hex digits and printable ascii characters.

Try as I might, I can't get the fs.read(...) function to actually execute. Here's the code I've started with:

    console.log(process.argv);
    if (process.argv.length < 3) {
        console.log("usage: node hd <filename>");
        process.exit(1);
    }
    fs.open(process.argv[2], 'r', (err,fd) => {
        if (err) {
            console.log("Error: ", err);
            process.exit(2);
        } else {
            fs.fstat(fd, (err,stats) => {
                if (err) {
                    process.exit(4);
                } else { 
                    var size = stats.size;
                    console.log("size = " + size);
                    going = true;
                    var buffer = new Buffer(8192);
                    var offset = 0;
                    //while( going ){
                    while( going ){
                        console.log("Reading...");
                        fs.read(fd, buffer, 0, Math.min(size-offset, 8192), offset, (error_reading_file, bytesRead, buffer) => {
                            console.log("READ");
                            if (error_reading_file)
                            {
                                console.log(error_reading_file.message);
                                going = false;
                            }else{
                                offset += bytesRead;
                                for (a=0; a< bytesRead; a++) {
                                    var z = buffer[a];
                                    console.log(z);
                                }
                                if (offset >= size) {
                                    going = false;
                                }
                            }
                        });
                    }
                    //}
                    fs.close(fd, (err) => {
                        if (err) {
                            console.log("Error closing file!");
                            process.exit(3);
                        }
                    });
                }
            });
        }
    });

If I comment-out the while() loop, the read() function executes, but only once of course (which works for files under 8K). Right now, I'm just not seeing the purpose of a read() function that takes a buffer and an offset like this... what's the trick?

Node v8.11.1, OSX 10.13.6

jps
  • 20,041
  • 15
  • 75
  • 79
Jupe
  • 48
  • 1
  • 5
  • Why not debugging your program to see what is happening? – Jeroen Heier Oct 04 '18 at 03:53
  • @JeroenHeier Just tried that, and it just skips over the fs.read() function and loops. – Jupe Oct 04 '18 at 03:58
  • 1
    Well `fs.read()` is asynchronous, thus it will not work the way you want it to in a `while()` loop. I'd suggest you use [`fs.createReadStream()`](https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options) and the `data` event from that stream. The data will then be fed to you in a series of events. And, to do anything useful in node.js, you really need to understand the concept of asynchronous I/O and how you program with that. – jfriend00 Oct 04 '18 at 05:02
  • @jfriend00 Thanks, that's very helpful. I'll refactor for the createReadSteam() method... but I'm still curious what value the fs.read() method has? – Jupe Oct 04 '18 at 11:24
  • Do you understand the `fs.read()` is non-blocking. You call it, it starts a read operation, it immediately returns and your `while()` loop keeps executing calling a whole bunch more `fs.read()` operations. In fact, the `while` loop keeps going forever and because it never stops, it never gives the event loop a chance to process any of your `fs.read()` completion events. As I said above, you can't place non-blocking operations inside a `while` loop and program like they are blocking. – jfriend00 Oct 04 '18 at 21:26
  • See [Infinite while loop](https://stackoverflow.com/questions/22125865/wait-until-flag-true/22125978#22125978) and [Why does a while loop block the event loop](https://stackoverflow.com/questions/34824460/why-does-a-while-loop-block-the-event-loop/34825352#34825352) for more explanation. – jfriend00 Oct 04 '18 at 21:26

1 Answers1

0

First of all, if this is just a one-off script that you run now and then and this is not code in a server, then there's no need to use the harder asynchronous I/O. You can use synchronous, blocking I/O will calls such as fs.openSync(), fs.statSync(), fs.readSync() etc... and then thinks will work inside your while loop because those calls are blocking (they don't return until the results are done). You can write normal looping and sequential code with them. One should never use synchronous, blocking I/O in a server environment because it ruins the scalability of a server process (it's ability to handle requests from multiple clients), but if this is a one-off local script with only one job to do, then synchronous I/O is perfectly appropriate.

Second, here's why your code doesn't work properly. Javascript in node.js is single-threaded and event-driven. That means that the interpreter pulls an event out of the event queue, runs the code associated with that event and does nothing else until that code returns control back to the interpreter. At that point, it then pulls the next event out of the event queue and runs it.

When you do this:

 while(going) {
     fs.read(... => (err, data) {
         // some logic here that may change the value of the going variable
     });
 }

You've just created yourself an infinite loop. This is because the while(going) loop just runs forever. It never stops looping and never returns control back to the interpreter so that it can fetch the next event from the event queue. It just keeps looping. But, the completion of the asynchronous, non-blocking fs.read() comes through the event queue. So, you're waiting for the going flag to change, but you never allow the system to process the events that can actually change the going flag. In your actual case, you will probably eventually run out of some sort of resource from calling fs.read() too many times in a tight loop or the interpreter will just hang in an infinite loop.

Understanding how to program a repetitive, looping type of tasks with asynchronous operations involved requires learning some new techniques for programming. Since much I/O in node.js is asynchronous and non-blocking, this is an essential skill to develop for node.js programming.

There are a number of different ways to solve this:

  1. Use fs.createReadStream() and read the file by listening for the data event. This is probably the cleanest scheme. If your objective here is do a hex outputter, you might even want to learn a stream feature called a transform where you transform the binary stream into a hex stream.

  2. Use promise versions of all the relevant fs functions here and use async/await to allow your for loop to wait for an async operation to finish before going to the next iteration. This allows you to write synchronous looking code, but use async I/O.

  3. Write a different type of looping construct (not using a while) loop that manually repeats the loop after fs.read() completes.


Here's a simple example using fs.createReadStream():

const fs = require('fs');

function convertToHex(val) {
    let str = val.toString(16);
    if (str.length < 2) {
        str = "0" + str;
    }
    return str.toUpperCase();
}

let stream = fs.createReadStream(process.argv[2]);
let outputBuffer = "";
stream.on('data', (data) => {
    // you get an unknown length chunk of data from the file here in a Buffer object
    for (const val of data) {
        outputBuffer += convertToHex(val) + " ";
        if (outputBuffer.length > 100) {
            console.log(outputBuffer);
            outputBuffer = "";
        }
    }
}).on('error', err => {
    // some sort of error reading the file
    console.log(err);
}).on('end', () => {
    // output any remaining buffer
    console.log(outputBuffer);
});

Hopefully you will notice that because the stream handles opening, closing and reading from the file for you that this is a lot simpler way to code. All you have to do is supply event handlers for data that is read, a read error and the end of the operation.


Here's a version using async/await and the new file interface (where the file descriptor is an object that you call methods on) with promises in node v10.

const fs = require('fs').promises;

function convertToHex(val) {
    let str = val.toString(16);
    if (str.length < 2) {
        str = "0" + str;
    }
    return str.toUpperCase();
}

async function run() {
    const readSize = 8192; 
    let cntr = 0;
    const buffer = Buffer.alloc(readSize);
    const fd = await fs.open(process.argv[2], 'r');
    try {
        let outputBuffer = "";
        while (true) {
            let data = await fd.read(buffer, 0, readSize, null);
            for (let i = 0; i < data.bytesRead; i++) {
                cntr++;
                outputBuffer += convertToHex(buffer.readUInt8(i)) + " ";
                if (outputBuffer.length > 100) {
                    console.log(outputBuffer);
                    outputBuffer = "";
                }
            }
            // see if all data has been read
            if (data.bytesRead !== readSize) {
                console.log(outputBuffer);
                break;
            }
        }
    } finally {
        await fd.close();
    }
    return cntr;
}

run().then(cntr => {
    console.log(`done - ${cntr} bytes read`);
}).catch(err => {
    console.log(err);
});
jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Thanks! - your original explanation was enough to point me in the right direction, and I did get the hexdump program working with a callback-based model! But, I'm very thankful for the detail explanation here. – Jupe Oct 07 '18 at 04:01