4

I would like to efficiently read the last X bytes of a very large file using node.js. What's the most efficient way of doing so?

As far as I know the only way of doing this is by creating a read stream and loop until a hit the byte index.

Example:

// lets assume I want the last 10 bytes;
// I would open a stream and loop until I reach the end of the file
// Once I did I would go to the last 10 bytes I kept in memory 

let f = fs.createReadStream('file.xpto'); //which is a 1gb file
let data = [];

f.on('data', function(data){
    for (d of data){
        data.push(d)
        data = data.slice(1,11); //keep only 10 elements
    }

})
f.on('end', function(){
    // check data
    console.log('Last test bytes is', data)
})
f.resume();
Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153
Lothre1
  • 3,523
  • 7
  • 43
  • 64
  • This is far too broad, and doesn't show any work you've done. It reads as "write code for me," and that's not how StackOverflow works. As written, unfortunately, this is off-topic. – David Makogon Feb 21 '18 at 22:44
  • 3
    It isn't broad. That's very specific. – Lothre1 Feb 21 '18 at 22:46
  • Prior to you adding any code to your question, yes, your question was broad. Meaning, there is more than one way to do it. Lots of potential approaches. And no specific right answer. And even after you added your code, it's still broad, still lots of ways to solve it. – David Makogon Feb 21 '18 at 23:05
  • @PatrickRoberts a close-vote doesn't, by itself, cause a question to be closed. But yes, it was certainly a candidate for a close-vote, as originally posted (just look at the edit history: no code, just "what's the most efficient way"). – David Makogon Feb 21 '18 at 23:09
  • @Lothre1 Don't update your question with an answer. If you want to follow up with your own answer below, that's fine, but also note that it is not considered polite to later accept your own answer instead if it is based on someone else's work (such as Arash). – Patrick Roberts Feb 21 '18 at 23:16
  • As you know the question specifically tells how to achieve it having performance in mind. Maybe I lacked the example, which obviously showed I was not doing it right. Doing it right is actually what Arash showed to be by guiding me to the right method. – Lothre1 Feb 21 '18 at 23:17
  • @PatrickRoberts fixed ;) – Lothre1 Feb 21 '18 at 23:20

3 Answers3

6

You essentially want to seek to a certain position in the file. There's a way to do that. Please consult this question and the answers:

seek() equivalent in javascript/node.js?

Essentially, determine the starting position (using the file length from its metadata and the number of bytes you're interested in) and use one of the following approaches to read - as stream or via buffers - the portion you're interested in.


Using fs.read

fs.read(fd, buffer, offset, length, position, callback)

position is an argument specifying where to begin reading from in the file.


Using fs.createReadStream

Alternatively, if you want to use the createReadStream function, then specify the start and end options: https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

fs.createReadStream(path[, options])

options can include start and end values to read a range of bytes from the file instead of the entire file.

Community
  • 1
  • 1
Arash Motamedi
  • 9,284
  • 5
  • 34
  • 43
5

Here's the sample code based on Arash Motamedi answer. This will let you read the last 10 bytes of a very large file in a few ms.

let fs = require('fs');

const _path = 'my-very-large-file.xpto';
const stats = fs.statSync(_path);

let size = stats.size;
let sizeStart = size-10;
let sizeEnd = size;


let options = {
    start: sizeStart,
    end: sizeEnd
}
let stream = fs.createReadStream(_path, options)
stream.on('data',(data)=>{
    console.log({data});
})
stream.resume()
Lothre1
  • 3,523
  • 7
  • 43
  • 64
2

For a promised version of the read solution:

import FS from 'fs/promises';

async function getLastXBytesBuffer() {
  const bytesToRead = 1024; // The x bytes you want to read
  const handle = await FS.open(path, 'r');
  const { size } = await handle.stat(path)

  // Calculate the position x bytes from the end
  const position = size - bytesToRead; 

  // Get the resulting buffer
  const { buffer } = await handle.read(Buffer.alloc(bytesToRead), 0, bytesToRead, position);

  // Dont forget to close filehandle
  await handle.close()

  return buffer
}

Justin Dalrymple
  • 630
  • 9
  • 15