749

I am trying to read a large file one line at a time. I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together.

 var Lazy=require("lazy");
 new Lazy(process.stdin)
     .lines
     .forEach(
          function(line) { 
              console.log(line.toString()); 
          }
 );
 process.stdin.resume();

The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample.

I tried:

 fs.open('./VeryBigFile.csv', 'r', '0666', Process);

 function Process(err, fd) {
    if (err) throw err;
    // DO lazy read 
 }

but it's not working. I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out.

I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Alex C
  • 16,624
  • 18
  • 66
  • 98
  • 3
    This turns out to be quite difficult using just low-level `fs.readSync()`. You can read binary octets into a buffer but there's no easy way to deal with partial UTF-8 or UTF-16 characters without inspecting the buffer before translating it to JavaScript strings and scanning for EOLs. The `Buffer()` type doesn't have as rich set of functions to operate on its instances as native strings, but native strings cannot contain binary data. It seems to me that lacking a built-in way to read text lines from arbitrary filehandles is a real gap in node.js. – hippietrail Jan 02 '13 at 01:52
  • 5
    Empty lines read in by this method get converted to a line with a single 0 (actual character code for 0) in them. I had to hack this line in there: `if (line.length==1 && line[0] == 48) special(line);` – Thabo Aug 09 '13 at 15:28
  • 2
    One might also use the 'line-by-line' package which does the job perfectly. – Patrice Feb 07 '14 at 08:49
  • 1
    Please update the question to say that the solution is to use a [transform stream](http://strongloop.com/strongblog/practical-examples-of-the-new-node-js-streams-api/) – Gabriel Llamas Jun 08 '14 at 06:33
  • You may want to update the question with the [built-in way to read lines from a file](http://stackoverflow.com/a/32599033/1269037) as of Node v0.12. – Dan Dascalescu Sep 16 '15 at 03:08
  • 2
    @DanDascalescu if you like you can add this to the list: your example landed slightly modified in `node`'s API docs https://github.com/nodejs/node/pull/4609 – eljefedelrodeodeljefe Jan 11 '16 at 19:47
  • @eljefedelrodeodeljefe - That's pretty cool! Thanks for doing that :) – Alex C Jan 11 '16 at 19:50
  • @AlexC welcome. :) This post was really helpful, so... – eljefedelrodeodeljefe Jan 17 '16 at 01:49
  • See also Quora: https://www.quora.com/What-is-the-best-way-to-read-a-file-line-by-line-in-node-js – hippietrail Feb 09 '16 at 10:44

30 Answers30

1099

Since Node.js v0.12 and as of Node.js v4.0.0, there is a stable readline core module. Here's the easiest way to read lines from a file, without any external modules:

const fs = require('fs');
const readline = require('readline');

async function processLineByLine() {
  const fileStream = fs.createReadStream('input.txt');

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });
  // Note: we use the crlfDelay option to recognize all instances of CR LF
  // ('\r\n') in input.txt as a single line break.

  for await (const line of rl) {
    // Each line in input.txt will be successively available here as `line`.
    console.log(`Line from file: ${line}`);
  }
}

processLineByLine();

Or alternatively:

var lineReader = require('readline').createInterface({
  input: require('fs').createReadStream('file.in')
});

lineReader.on('line', function (line) {
  console.log('Line from file:', line);
});

lineReader.on('close', function () {
    console.log('all done, son');
});

The last line is read correctly (as of Node v0.12 or later), even if there is no final \n.

UPDATE: this example has been added to Node's API official documentation.

d512
  • 32,267
  • 28
  • 81
  • 107
Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
  • Thanks for the update Dan! I've not tested this, but have swapped the correct answer (with my fingers crossed) so people visiting can choose the latest info. The perils of a 4 year old question! (Necro question?) – Alex C Sep 16 '15 at 12:51
  • 8
    you need a terminal:false in the createInterface definition – glasspill Sep 17 '15 at 13:06
  • 1
    Thanks for the comment @glasspill if you can't edit the answer yourself can you confirm that you mean the code should read `.createInterface({ input: require('fs').createReadStream('file.in'), terminal: false });` ? Dan or I will be happy to update - maybe with a /* comment */ explaining why it has to be there – Alex C Sep 17 '15 at 15:31
  • @glasspill: the code works just fine without `terminal: false`. Can you elaborate on why it would need to be there? – Dan Dascalescu Sep 17 '15 at 17:59
  • 1
    @DanDascalescu In some instances this needs a `terminal :false` as @glasspill has mentioned. I myself require it when running a script from file with both Node.js 0.12.7 and Node.js 4.0.0, as otherwise I get an error regarding `isTTY` being undefined. The `readline` package is specifically designed to run in terminal, so in my code I need to configure it so that it can be used from a script. – Cameron Sep 17 '15 at 19:16
  • @DanDascalescu, the reason is the one bonesbrigade mentioned – glasspill Sep 18 '15 at 13:22
  • 75
    How to determine the last line? By catching a "close" event: `rl.on('close', cb)` – Green Sep 27 '15 at 16:04
  • if I just want to read one line, this method is not good, right? – Yin Oct 10 '15 at 05:43
  • down-vote this answer. This method can't stop the processing gracefully – Yin Oct 10 '15 at 06:15
  • 39
    Readline is for a similar purpose as [GNU Readline](https://cnswww.cns.cwru.edu/php/chet/readline/rltop.html), *not* for reading files line by line. There are several caveats in using it to read files and this is not a best practice. – Nakedible Oct 10 '15 at 11:42
  • 11
    @Nakedible: interesting. Could you post an answer with a better method? – Dan Dascalescu Oct 14 '15 at 23:23
  • 10
    I consider https://github.com/jahewson/node-byline to be the best implementation of line-by-line reading, but opinions may vary. – Nakedible Oct 15 '15 at 10:42
  • 3
    How would one throttle this or at least allow a function to callback prior to grabbing the next line? – Ryan Jan 03 '16 at 05:16
  • 3
    Is there a way to get line number? – Piti Ongmongkolkul Jan 17 '16 at 12:04
  • 2
    @PitiOngmongkolkul use a `var count = 0;` at the top, with a `count++;` at the top of the `on('line')` handler @Ryan use lineReader.pause() at the top of the `on('line')` handler and use lineReader.resume() when you are ready to continue. – JJ Stiff Apr 20 '16 at 19:22
  • 1
    @Jake try `lineReader.pause(); lineReader.close();` Does that work for you? I have yet to need to gracefully stop processing... – JJ Stiff Apr 20 '16 at 19:38
  • Any read line module or npm API seems not working when I write them in jasmine test file or when I include my module (who reads file line by line) in jasmine test file. – Amit Kumar Gupta Jan 14 '17 at 08:52
  • I know this is old , but I asked similar question when I was trying to do it with `readline` , but I end up using `line-by-line` - [here](http://stackoverflow.com/questions/42232026/searching-text-file-with-readline-node-js/42301177#42301177) it is. – user1207289 Feb 27 '17 at 17:29
  • Adding terminal: true to the interface definition actually stops the linereader from reading any more lines once you call .close(), otherwise it keeps reading. – blueprintchris Mar 21 '18 at 11:33
  • Let's say I put an `if` statement around the `console.log` statement. If the program goes into that condition, is there a way to read line by line from there? – Rod Jun 26 '18 at 00:05
  • There is a bug when there is another waiting on a promise returning between the for await loop and createInterface. See https://stackoverflow.com/a/62887022/10694438 – Changdae Park Nov 16 '20 at 04:54
  • Why am I getting result like this for each line? �\u0005\u0003��@^�@^i /\u0018�%\u0006��\u0006�J\u0003y�@.1��7�W\u001a�\u000b\u0006r�������@^0�K\f�� – toadead Dec 27 '21 at 21:03
176

For such a simple operation there shouldn't be any dependency on third-party modules. Go easy.

var fs = require('fs'),
    readline = require('readline');

var rd = readline.createInterface({
    input: fs.createReadStream('/path/to/file'),
    output: process.stdout,
    console: false
});

rd.on('line', function(line) {
    console.log(line);
});

rd.on('close', function() {
    console.log('all done, son');
});
d512
  • 32,267
  • 28
  • 81
  • 107
kofrasa
  • 2,090
  • 2
  • 14
  • 10
  • 40
    sadly, this attractive solution does'nt work correctly—`line` events come only after hitting `\n`, ie, all the alternatives are missed (see http://www.unicode.org/reports/tr18/#Line_Boundaries). #2, data after the last `\n` is silently ignored (see http://stackoverflow.com/questions/18450197/nodejs-readline-missing-last-line-of-file). i'd call this solution *dangerous* cause it works for 99% of all files and for 99% of the data but **fails silently** for the rest. whenever you do `fs.writeFileSync( path, lines.join('\n'))` you've written a file that will only be partly read by above solution. – flow Aug 27 '13 at 15:56
  • 4
    There is a problem with this solution. If you use your.js – zag2art Jan 11 '14 at 12:31
  • The `readline` package behaves in truly bizarre ways to an experienced Unix/Linux programmer. – Pointy Oct 21 '14 at 23:41
  • 13
    `rd.on("close", ..);` can be used as a callback (occurrs when all lines are read) – Luca Steeb Feb 16 '15 at 23:39
  • 7
    The "data after the last \n" issue seems to be resolved in my version of node (0.12.7). So I prefer this answer, which seems the simplest and most elegant. – Myk Melez Sep 01 '15 at 22:23
  • Thanks @LucaSteeb for the 'close' event mention, had troubles finding that one. – David Thomas Mar 04 '16 at 07:44
  • @flow I believe the issue is fixed—I was able to successfully read the last line. See [github #7238](https://github.com/nodejs/node-v0.x-archive/issues/7238) Credit to AndSmith on the link you provided. – Nathan Goings Nov 14 '21 at 05:40
68

Update in 2019

An awesome example is already posted on official Nodejs documentation. here

This requires the latest Nodejs is installed on your machine. >11.4

const fs = require('fs');
const readline = require('readline');

async function processLineByLine() {
  const fileStream = fs.createReadStream('input.txt');

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });
  // Note: we use the crlfDelay option to recognize all instances of CR LF
  // ('\r\n') in input.txt as a single line break.

  for await (const line of rl) {
    // Each line in input.txt will be successively available here as `line`.
    console.log(`Line from file: ${line}`);
  }
}

processLineByLine();
NFT Master
  • 1,400
  • 12
  • 13
  • 3
    this answer is much better than anything above thanks to its promise-based behaviour, distinctively indicating the EOF. – phil294 Sep 04 '19 at 21:39
  • Thanks, that's sweet. – Goran Stoyanov Oct 23 '19 at 08:42
  • 13
    Maybe this is obvious to others, but it took me a while to debug: if you have any `await`s between the `createInterface()` call and the start of the `for await` loop, you will mysteriously lose lines from the start of the file. `createInterface()` immediately starts emitting lines behind the scenes, and the async iterator implicitly created with `const line of rl` can’t start listening for those lines until it is created. – andrewdotn Nov 09 '19 at 16:09
61

You don't have to open the file, but instead, you have to create a ReadStream.

fs.createReadStream

Then pass that stream to Lazy

MichaelJones
  • 1,336
  • 2
  • 12
  • 22
Raynos
  • 166,823
  • 56
  • 351
  • 396
  • You're a star! Thanks Raynos :) perfect. – Alex C May 27 '11 at 19:13
  • 2
    Is there something like an end event for Lazy? When all lines have been read in? – Max Jun 15 '11 at 08:08
  • 1
    @Max, Try: `new lazy(fs.createReadStream('...')).lines.forEach(function(l) { /* ... */ }).join(function() { /* Done */ })` – Cecchi Sep 20 '12 at 19:39
  • 6
    @Cecchi and @Max, don't use join because it will buffer the entire file in memory. Instead, just listen to the 'end' event: `new lazy(...).lines.forEach(...).on('end', function() {...})` – Corin Oct 17 '12 at 12:33
  • @Corin awesome, thanks! Good info, wish it were documented a bit better! – Cecchi Oct 17 '12 at 20:13
  • 3
    @Cecchi, @Corin, and @Max: For what it's worth, I drove myself crazy chaining `.on('end'...` _after_ `.forEach(...)`, when in fact everything behaved as expected when I bound the event _first_. – crowjonah Sep 06 '13 at 19:05
  • 59
    This result is very high on search results, so it is worth noting that Lazy looks abandoned. It has been 7 months without any changes, and has some horrifying bugs (last line ignored, massive memory leaks, etc). – blu Nov 20 '13 at 21:21
  • 1
    People interested in a similar library might go here, though: https://github.com/dtao/lazy.js – Gert Sønderby Jan 09 '15 at 14:38
56
require('fs').readFileSync('file.txt', 'utf-8').split(/\r?\n/).forEach(function(line){
  console.log(line);
})
Yukulélé
  • 15,644
  • 10
  • 70
  • 94
C B
  • 12,482
  • 5
  • 36
  • 48
  • 68
    This will read the *entire file* in memory, then split it into lines. It's not what the questions asks. The point is to be able to read large files sequentially, on demand. – Dan Dascalescu Oct 25 '13 at 10:50
  • 8
    This fits my use case, I was looking for a simply way to convert input from one script into another format. Thanks! – Callat Aug 12 '19 at 15:56
  • 3
    This might not answer the original question, but is still useful if it fits your memory constraints. – Kenny Worden Jan 29 '21 at 16:32
43

there is a very nice module for reading a file line by line, it's called line-reader

with it you simply just write:

var lineReader = require('line-reader');

lineReader.eachLine('file.txt', function(line, last) {
  console.log(line);
  // do whatever you want with line...
  if(last){
    // or check if it's the last one
  }
});

you can even iterate the file with a "java-style" interface, if you need more control:

lineReader.open('file.txt', function(reader) {
  if (reader.hasNextLine()) {
    reader.nextLine(function(line) {
      console.log(line);
    });
  }
});
Aliaksandr Sushkevich
  • 11,550
  • 7
  • 37
  • 44
polaretto
  • 791
  • 9
  • 11
  • 4
    This works well. It even reads the last line (!). It is worth mentionning that it keeps the \r if it is a windows style text file. line.trim() does the trick of removing the extra \r. – Pierre-Luc Bertrand Mar 04 '14 at 18:41
  • It's sub-optimal in that input can only be from a named file, and not (for an obvious and extremely important example, `process/stdin`). At least, if it can, it's certainly not obvious from reading the code and attempting it. – Pointy Oct 21 '14 at 23:40
  • 2
    In the meantime there's a built-in way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:04
  • This is old, but in case anyone stumbles upon it: `function(reader)` and `function(line)` should be: `function(err,reader)` and `function(err,line)`. – jallmer Nov 27 '18 at 13:50
  • 3
    Just for the record, `line-reader` reads the file asynchronously. The synchronous alternative to it is `line-reader-sync` – Prajwal Mar 25 '19 at 06:50
19

You can always roll your own line reader. I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\n'

var last = "";

process.stdin.on('data', function(chunk) {
    var lines, i;

    lines = (last+chunk).split("\n");
    for(i = 0; i < lines.length - 1; i++) {
        console.log("line: " + lines[i]);
    }
    last = lines[i];
});

process.stdin.on('end', function() {
    console.log("line: " + last);
});

process.stdin.resume();

I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash.

Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest .

Michael Robinson
  • 29,278
  • 12
  • 104
  • 130
Ernelli
  • 3,960
  • 3
  • 28
  • 34
  • There doesn't seem to be any trivial way to extend this to work with arbitrary files besides just `stdin`... unless I'm missing somethign. – hippietrail Jan 02 '13 at 01:55
  • 3
    @hippietrail you can create a `ReadStream` with `fs.createReadStream('./myBigFile.csv')` and use it instead of `stdin` – nolith May 08 '13 at 12:57
  • 2
    Is each chunk guaranteed to contain only complete lines? Are multi-byte UTF-8 characters guaranteed not to be split at chunk boundaries? – hippietrail May 08 '13 at 22:38
  • 1
    @hippietrail I dont think that multibyte characters is handled correctly by this implementation. For that, one must first correctly convert the buffers to strings and keep track of characters that is split between two buffers. To do that properly, one can use the built in [StringDecoder](http://nodejs.org/api/string_decoder.html) – Ernelli Dec 05 '13 at 09:07
  • In the meantime there's a built-in way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:04
  • It wouldn't be hard to make it UTF-8 safe. Just have `last` be a `Buffer`, use `last.indexOf('\n)` and `last.slice()` instead of `split()`. The cool thing about UTF-8 is that only the bytes that can be rendered as ASCII characters will have the 8th bit set to 0. So scanning the buffer for 10 will only ever match newlines and never part of a multibyte character. But if you need more than UTF-8, a generalized decoding solution would be better. – binki Dec 13 '16 at 06:58
19

Old topic, but this works:

var rl = readline.createInterface({
      input : fs.createReadStream('/path/file.txt'),
      output: process.stdout,
      terminal: false
})
rl.on('line',function(line){
     console.log(line) //or parse line
})

Simple. No need for an external module.

Alison R.
  • 4,204
  • 28
  • 33
nf071590
  • 1,170
  • 1
  • 13
  • 17
  • 2
    If you get `readline is not defined` or `fs is not defined`, add `var readline = require('readline');` and `var fs = require('fs');` to get this to work. Otherwise sweet, sweet code. Thanks. – bergie3000 Apr 11 '15 at 07:22
  • 12
    This answer is [an exact dupe of an earlier answer](http://stackoverflow.com/a/15554600/1028230), but without the comments warning [the readline package is marked unstable](http://stackoverflow.com/questions/6156501/read-a-file-one-line-at-a-time-in-node-js#comment22523025_15554600) (still unstable as of Apr 2015) and, in mid 2013, [had trouble reading last lines of a file without line endings](http://stackoverflow.com/questions/6156501/read-a-file-one-line-at-a-time-in-node-js#comment27147074_15554600). The last line issue cropped up the 1st time I used it in v0.10.35, & then went away. /argh – ruffin Apr 30 '15 at 19:57
  • You don't need to specify the output if all you do is [read from a file stream](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:03
12

With the carrier module:

var carrier = require('carrier');

process.stdin.resume();
carrier.carry(process.stdin, function(line) {
    console.log('got one line: ' + line);
});
Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
Touv
  • 960
  • 8
  • 10
  • Nice. This is also works for any input file: `var inStream = fs.createReadStream('input.txt', {flags:'r'});` But your syntax is cleaner than the documented method of using .on(): `carrier.carry(inStream).on('line', function(line) { ... ` – Brent Faust Jan 01 '12 at 22:34
  • carrier seems to only handle `\r\n` and `\n` line endings. If you ever need to deal with MacOS-style test files from before OS X, they used `\r` and carrier does not handle this. Surprisingly, there are still such files floating about in the wild. You might also need to handle the Unicode BOM (byte order mark) explicitly, this is used at the beginning of text files in the MS Windows sphere of influence. – hippietrail Dec 31 '12 at 03:31
  • In the meantime there's a built-in way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:04
11

I ended up with a massive, massive memory leak using Lazy to read line by line when trying to then process those lines and write them to another stream due to the way drain/pause/resume in node works (see: http://elegantcode.com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/ (i love this guy btw)). I haven't looked closely enough at Lazy to understand exactly why, but I couldn't pause my read stream to allow for a drain without Lazy exiting.

I wrote the code to process massive csv files into xml docs, you can see the code here: https://github.com/j03m/node-csv2xml

If you run the previous revisions with Lazy line it leaks. The latest revision doesn't leak at all and you can probably use it as the basis for a reader/processor. Though I have some custom stuff in there.

Edit: I guess I should also note that my code with Lazy worked fine until I found myself writing large enough xml fragments that drain/pause/resume because a necessity. For smaller chunks it was fine.

j03m
  • 5,195
  • 4
  • 46
  • 50
  • In the meantime there's a much simpler way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:01
  • yup. That is the correct way now. But this was from 2011. :) – j03m Sep 16 '15 at 14:25
9

In most cases this should be enough:

const fs = require("fs")

fs.readFile('./file', 'utf-8', (err, file) => {
  const lines = file.split('\n')

  for (let line of lines)
    console.log(line)
});
Dorian
  • 22,759
  • 8
  • 120
  • 116
8

Edit:

Use a transform stream.


With a BufferedReader you can read lines.

new BufferedReader ("lorem ipsum", { encoding: "utf8" })
    .on ("error", function (error){
        console.log ("error: " + error);
    })
    .on ("line", function (line){
        console.log ("line: " + line);
    })
    .on ("end", function (){
        console.log ("EOF");
    })
    .read ();
Gabriel Llamas
  • 18,244
  • 26
  • 87
  • 112
  • 1
    In the meantime there's a much simpler way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:01
6

I was frustrated by the lack of a comprehensive solution for this, so I put together my own attempt (git / npm). Copy-pasted list of features:

  • Interactive line processing (callback-based, no loading the entire file into RAM)
  • Optionally, return all lines in an array (detailed or raw mode)
  • Interactively interrupt streaming, or perform map/filter like processing
  • Detect any newline convention (PC/Mac/Linux)
  • Correct eof / last line treatment
  • Correct handling of multi-byte UTF-8 characters
  • Retrieve byte offset and byte length information on per-line basis
  • Random access, using line-based or byte-based offsets
  • Automatically map line-offset information, to speed up random access
  • Zero dependencies
  • Tests

NIH? You decide :-)

panta82
  • 2,763
  • 3
  • 20
  • 38
6

Since posting my original answer, I found that split is a very easy to use node module for line reading in a file; Which also accepts optional parameters.

var split = require('split');
fs.createReadStream(file)
    .pipe(split())
    .on('data', function (line) {
      //each chunk now is a seperate line! 
    });

Haven't tested on very large files. Let us know if you do.

nf071590
  • 1,170
  • 1
  • 13
  • 17
5

I wanted to tackle this same problem, basically what in Perl would be:

while (<>) {
    process_line($_);
}

My use case was just a standalone script, not a server, so synchronous was fine. These were my criteria:

  • The minimal synchronous code that could reuse in many projects.
  • No limits on file size or number of lines.
  • No limits on length of lines.
  • Able to handle full Unicode in UTF-8, including characters beyond the BMP.
  • Able to handle *nix and Windows line endings (old-style Mac not needed for me).
  • Line endings character(s) to be included in lines.
  • Able to handle last line with or without end-of-line characters.
  • Not use any external libraries not included in the node.js distribution.

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl.

After a surprising amount of effort and a couple of false starts this is the code I came up with. It's pretty fast but less trivial than I would've expected: (fork it on GitHub)

var fs            = require('fs'),
    StringDecoder = require('string_decoder').StringDecoder,
    util          = require('util');

function lineByLine(fd) {
  var blob = '';
  var blobStart = 0;
  var blobEnd = 0;

  var decoder = new StringDecoder('utf8');

  var CHUNK_SIZE = 16384;
  var chunk = new Buffer(CHUNK_SIZE);

  var eolPos = -1;
  var lastChunk = false;

  var moreLines = true;
  var readMore = true;

  // each line
  while (moreLines) {

    readMore = true;
    // append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
    while (readMore) {

      // do we have a whole line? (with LF)
      eolPos = blob.indexOf('\n', blobStart);

      if (eolPos !== -1) {
        blobEnd = eolPos;
        readMore = false;

      // do we have the last line? (no LF)
      } else if (lastChunk) {
        blobEnd = blob.length;
        readMore = false;

      // otherwise read more
      } else {
        var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);

        lastChunk = bytesRead !== CHUNK_SIZE;

        blob += decoder.write(chunk.slice(0, bytesRead));
      }
    }

    if (blobStart < blob.length) {
      processLine(blob.substring(blobStart, blobEnd + 1));

      blobStart = blobEnd + 1;

      if (blobStart >= CHUNK_SIZE) {
        // blobStart is in characters, CHUNK_SIZE is in octets
        var freeable = blobStart / CHUNK_SIZE;

        // keep blob from growing indefinitely, not as deterministic as I'd like
        blob = blob.substring(CHUNK_SIZE);
        blobStart -= CHUNK_SIZE;
        blobEnd -= CHUNK_SIZE;
      }
    } else {
      moreLines = false;
    }
  }
}

It could probably be cleaned up further, it was the result of trial and error.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
5
function createLineReader(fileName){
    var EM = require("events").EventEmitter
    var ev = new EM()
    var stream = require("fs").createReadStream(fileName)
    var remainder = null;
    stream.on("data",function(data){
        if(remainder != null){//append newly received data chunk
            var tmp = new Buffer(remainder.length+data.length)
            remainder.copy(tmp)
            data.copy(tmp,remainder.length)
            data = tmp;
        }
        var start = 0;
        for(var i=0; i<data.length; i++){
            if(data[i] == 10){ //\n new line
                var line = data.slice(start,i)
                ev.emit("line", line)
                start = i+1;
            }
        }
        if(start<data.length){
            remainder = data.slice(start);
        }else{
            remainder = null;
        }
    })

    stream.on("end",function(){
        if(null!=remainder) ev.emit("line",remainder)
    })

    return ev
}


//---------main---------------
fileName = process.argv[2]

lineReader = createLineReader(fileName)
lineReader.on("line",function(line){
    console.log(line.toString())
    //console.log("++++++++++++++++++++")
})
user531097
  • 75
  • 1
  • 3
  • I will test this, but can you tell me, is it guaranteed never to break multibyte characters? (UTF-8 / UTF-16) – hippietrail Jan 02 '13 at 01:54
  • 2
    @hippietrail: The answer is no for UTF-8, even though it is working on a byte stream rather than a character stream. It breaks on newlines (0x0a). In UTF-8, all bytes of a multibyte character have their hi-order bit set. Thus, no multibyte character can include an embedded newline or other common ASCII character. UTF-16 and UTF-32 are another matter, however. – George May 10 '13 at 04:40
  • @George: I think we misunderstand each other. As CR and LF are both within the ASCII range and UTF-8 preserves the 128 ASCII characters unchanged, neither CR nor LF can ever be part of a multibyte UTF-8 character. What I was asking is whether the `data` in the call to `stream.on("data")` might ever begin or end with only part of a multibyte UTF-8 character such as `ა` which is `U+10D0`, made up of the three bytes `e1` `83` `90` – hippietrail May 10 '13 at 06:19
  • 1
    This still loads the whole file contents into memory before making it a "new line". This does not READ one line at a time, it instead takes ALL the lines and then breaks them up according to the "new line" buffer length. This method defeats the purpose of creating a stream. – Justin Jul 29 '15 at 17:13
  • In the meantime there's a much simpler way to read lines from a file, using the [`readline` core module](http://stackoverflow.com/a/32599033/1269037). – Dan Dascalescu Sep 16 '15 at 03:01
  • @hippietrail because `.slice()` is only around newline chars which are ASCII, there is no danger of splitting a char in UTF-8. But there is in other encodings such as UTF-16. In your example, valid UTF-8 input would never be `0xe1830a90` because `0x0a` *never* can be part of a multibyte character. In fact, George did answer your question even though you claim he didn't. – binki Dec 13 '16 at 07:12
4

A new function was added in Node.js v18.11.0 to read files line by line

  • filehandle.readLines([options])

This is how you use this with a text file you want to read

import { open } from 'node:fs/promises';
myFileReader();
async function myFileReader() {
    const file = await open('./TextFileName.txt');
    for await (const line of file.readLines()) {
        console.log(line)
    }
}

To understand more read Node.js documentation here is the link for file system readlines(): https://nodejs.org/api/fs.html#filehandlereadlinesoptions

Larry
  • 401
  • 2
  • 6
3

Generator based line reader: https://github.com/neurosnap/gen-readlines

var fs = require('fs');
var readlines = require('gen-readlines');

fs.open('./file.txt', 'r', function(err, fd) {
  if (err) throw err;
  fs.fstat(fd, function(err, stats) {
    if (err) throw err;

    for (var line of readlines(fd, stats.size)) {
      console.log(line.toString());
    }

  });
});
neurosnap
  • 5,658
  • 4
  • 26
  • 30
2

If you want to read a file line by line and writing this in another:

var fs = require('fs');
var readline = require('readline');
var Stream = require('stream');

function readFileLineByLine(inputFile, outputFile) {

   var instream = fs.createReadStream(inputFile);
   var outstream = new Stream();
   outstream.readable = true;
   outstream.writable = true;

   var rl = readline.createInterface({
      input: instream,
      output: outstream,
      terminal: false
   });

   rl.on('line', function (line) {
        fs.appendFileSync(outputFile, line + '\n');
   });
};
Thami Bouchnafa
  • 1,987
  • 1
  • 15
  • 21
2
var fs = require('fs');

function readfile(name,online,onend,encoding) {
    var bufsize = 1024;
    var buffer = new Buffer(bufsize);
    var bufread = 0;
    var fd = fs.openSync(name,'r');
    var position = 0;
    var eof = false;
    var data = "";
    var lines = 0;

    encoding = encoding || "utf8";

    function readbuf() {
        bufread = fs.readSync(fd,buffer,0,bufsize,position);
        position += bufread;
        eof = bufread ? false : true;
        data += buffer.toString(encoding,0,bufread);
    }

    function getLine() {
        var nl = data.indexOf("\r"), hasnl = nl !== -1;
        if (!hasnl && eof) return fs.closeSync(fd), online(data,++lines), onend(lines); 
        if (!hasnl && !eof) readbuf(), nl = data.indexOf("\r"), hasnl = nl !== -1;
        if (!hasnl) return process.nextTick(getLine);
        var line = data.substr(0,nl);
        data = data.substr(nl+1);
        if (data[0] === "\n") data = data.substr(1);
        online(line,++lines);
        process.nextTick(getLine);
    }
    getLine();
}

I had the same problem and came up with above solution looks simular to others but is aSync and can read large files very quickly

Hopes this helps

2

Two questions we must ask ourselves while doing such operations are:

  1. What's the amount of memory used to perform it?
  2. Is the memory consumption increasing drastically with the file size?

Solutions like require('fs').readFileSync() loads the whole file into memory. That means that the amount of memory required to perform operations will be almost equivalent to the file size. We should avoid these for anything larger than 50mbs

We can easily track the amount of memory used by a function by placing these lines of code after the function invocation :

    const used = process.memoryUsage().heapUsed / 1024 / 1024;
    console.log(
      `The script uses approximately ${Math.round(used * 100) / 100} MB`
    );

Right now the best way to read particular lines from a large file is using node's readline. The documentation has amazing examples.

vivek agarwal
  • 435
  • 4
  • 6
2

This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await. It's a solution that I find "natural" when processing large text files without having to resort to the readline package or any non-core dependency.

let buf = '';
for await ( const chunk of fs.createReadStream('myfile') ) {
    const lines = buf.concat(chunk).split(/\r?\n/);
    buf = lines.pop() ?? '';
    for( const line of lines ) {
        console.log(line);
    }
}
if(buf.length) console.log(buf);  // last line, if file does not end with newline

You can adjust encoding in the fs.createReadStream or use chunk.toString(<arg>). Also this let's you better fine-tune the line splitting to your taste, ie. use .split(/\n+/) to skip empty lines and control the chunk size with fs.createReadStream('myfile', { highWaterMark: <chunkSize> }).

Don't forget to create a function like processLine(line) to avoid repeating the line processing code twice due to the ending buf leftover. Unfortunately, the ReadStream instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats() with .bytesRead. Hence the final buf processing solution, unless you're absolutely sure your file ends with a newline \n, in which case the for await loop should suffice.

Performance Considerations

Chunk sizes are important for performance, the default is 64k for text files and, for multi MB files, larger chunks can improve speed by an order of magnitude.

The above snippet runs at least the same speed (or even 5% faster sometimes) as code based on NodeJS v18's fs.readLine() or based on the readline module (the accepted answer), once you tune highWaterMark to something that your machine can handle, ie. setting it to the same size as the file, if your available memory allows it, is the fastest.

In any case, any of NodeJS line-reading answers here are an order of magnitude slower than the Perl or native *Nix solutions.

Similar alternatives

★ If you prefer the evented asynchronous version, this would be it:

let buf = '';
fs.createReadStream('myfile')
.on('data', chunk => {
    const lines = buf.concat(chunk).split(/\r?\n/);
    buf = lines.pop();
    for( const line of lines ) {
        console.log(line);
    }
})
.on('end', () => buf.length && console.log(buf) );

★ Now if you don't mind importing the stream core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression:

const { Writable } = require('stream');
let buf = '';
fs.createReadStream('myfile').pipe(
    new Writable({
        write: (chunk, enc, next) => {
            const lines = buf.concat(chunk).split(/\r?\n/);
            buf = lines.pop();
            for (const line of lines) {
                console.log(line);
            }
            next();
        }
    })
).on('finish', () => buf.length && console.log(buf) );
ojosilva
  • 1,984
  • 2
  • 15
  • 21
  • I would add `buf = lines.pop() ?? ''` - as Array.pop() may return `undefined`, forcing me to trace back and find if `lines` could be empty, it also makes Typescript happy. – Victor Rybynok Feb 17 '23 at 21:32
1

I have a little module which does this well and is used by quite a few other projects npm readline Note thay in node v10 there is a native readline module so I republished my module as linebyline https://www.npmjs.com/package/linebyline

if you dont want to use the module the function is very simple:

var fs = require('fs'),
EventEmitter = require('events').EventEmitter,
util = require('util'),
newlines = [
  13, // \r
  10  // \n
];
var readLine = module.exports = function(file, opts) {
if (!(this instanceof readLine)) return new readLine(file);

EventEmitter.call(this);
opts = opts || {};
var self = this,
  line = [],
  lineCount = 0,
  emit = function(line, count) {
    self.emit('line', new Buffer(line).toString(), count);
  };
  this.input = fs.createReadStream(file);
  this.input.on('open', function(fd) {
    self.emit('open', fd);
  })
  .on('data', function(data) {
   for (var i = 0; i < data.length; i++) {
    if (0 <= newlines.indexOf(data[i])) { // Newline char was found.
      lineCount++;
      if (line.length) emit(line, lineCount);
      line = []; // Empty buffer.
     } else {
      line.push(data[i]); // Buffer new line data.
     }
   }
 }).on('error', function(err) {
   self.emit('error', err);
 }).on('end', function() {
  // Emit last line if anything left over since EOF won't trigger it.
  if (line.length){
     lineCount++;
     emit(line, lineCount);
  }
  self.emit('end');
 }).on('close', function() {
   self.emit('close');
 });
};
util.inherits(readLine, EventEmitter);
Maleck13
  • 1,709
  • 15
  • 17
1

Another solution is to run logic via sequential executor nsynjs. It reads file line-by-line using node readline module, and it doesn't use promises or recursion, therefore not going to fail on large files. Here is how the code will looks like:

var nsynjs = require('nsynjs');
var textFile = require('./wrappers/nodeReadline').textFile; // this file is part of nsynjs

function process(textFile) {

    var fh = new textFile();
    fh.open('path/to/file');
    var s;
    while (typeof(s = fh.readLine(nsynjsCtx).data) != 'undefined')
        console.log(s);
    fh.close();
}

var ctx = nsynjs.run(process,{},textFile,function () {
    console.log('done');
});

Code above is based on this exampe: https://github.com/amaksr/nsynjs/blob/master/examples/node-readline/index.js

amaksr
  • 7,555
  • 2
  • 16
  • 17
0

i use this:

function emitLines(stream, re){
    re = re && /\n/;
    var buffer = '';

    stream.on('data', stream_data);
    stream.on('end', stream_end);

    function stream_data(data){
        buffer += data;
        flush();
    }//stream_data

    function stream_end(){
        if(buffer) stream.emmit('line', buffer);
    }//stream_end


    function flush(){
        var re = /\n/;
        var match;
        while(match = re.exec(buffer)){
            var index = match.index + match[0].length;
            stream.emit('line', buffer.substring(0, index));
            buffer = buffer.substring(index);
            re.lastIndex = 0;
        }
    }//flush

}//emitLines

use this function on a stream and listen to the line events that is will emit.

gr-

Elmer
  • 9,147
  • 2
  • 48
  • 38
0

While you should probably use the readline module as the top answer suggests, readline appears to be oriented toward command line interfaces rather than line reading. It's also a little bit more opaque regarding buffering. (Anyone who needs a streaming line oriented reader probably will want to tweak buffer sizes). The readline module is ~1000 lines while this, with stats and tests, is 34.

const EventEmitter = require('events').EventEmitter;
class LineReader extends EventEmitter{
    constructor(f, delim='\n'){
        super();
        this.totalChars = 0;
        this.totalLines = 0;
        this.leftover = '';

        f.on('data', (chunk)=>{
            this.totalChars += chunk.length;
            let lines = chunk.split(delim);
            if (lines.length === 1){
                this.leftover += chunk;
                return;
            }
            lines[0] = this.leftover + lines[0];
            this.leftover = lines[lines.length-1];
            if (this.leftover) lines.pop();
            this.totalLines += lines.length;
            for (let l of lines) this.onLine(l);
        });
        // f.on('error', ()=>{});
        f.on('end', ()=>{console.log('chars', this.totalChars, 'lines', this.totalLines)});
    }
    onLine(l){
        this.emit('line', l);
    }
}
//Command line test
const f = require('fs').createReadStream(process.argv[2], 'utf8');
const delim = process.argv[3];
const lineReader = new LineReader(f, delim);
lineReader.on('line', (line)=> console.log(line));

Here's an even shorter version, without the stats, at 19 lines:

class LineReader extends require('events').EventEmitter{
    constructor(f, delim='\n'){
        super();
        this.leftover = '';
        f.on('data', (chunk)=>{
            let lines = chunk.split(delim);
            if (lines.length === 1){
                this.leftover += chunk;
                return;
            }
            lines[0] = this.leftover + lines[0];
            this.leftover = lines[lines.length-1];
            if (this.leftover) 
                lines.pop();
            for (let l of lines)
                this.emit('line', l);
        });
    }
}
javajosh
  • 510
  • 5
  • 12
0
const fs = require("fs")

fs.readFile('./file', 'utf-8', (err, data) => {
var innerContent;
    console.log("Asynchronous read: " + data.toString());
    const lines = data.toString().split('\n')
    for (let line of lines)
        innerContent += line + '<br>';


});
Arindam
  • 675
  • 8
  • 15
0

I wrap the whole logic of daily line processing as a npm module: line-kit https://www.npmjs.com/package/line-kit

// example
var count = 0
require('line-kit')(require('fs').createReadStream('/etc/issue'),
                    (line) => { count++; },
                    () => {console.log(`seen ${count} lines`)})
Joyer
  • 371
  • 2
  • 9
-1

I use below code the read lines after verify that its not a directory and its not included in the list of files need not to be check.

(function () {
  var fs = require('fs');
  var glob = require('glob-fs')();
  var path = require('path');
  var result = 0;
  var exclude = ['LICENSE',
    path.join('e2e', 'util', 'db-ca', 'someother-file'),
    path.join('src', 'favicon.ico')];
  var files = [];
  files = glob.readdirSync('**');

  var allFiles = [];

  var patternString = [
    'trade',
    'order',
    'market',
    'securities'
  ];

  files.map((file) => {
    try {
      if (!fs.lstatSync(file).isDirectory() && exclude.indexOf(file) === -1) {
        fs.readFileSync(file).toString().split(/\r?\n/).forEach(function(line){
          patternString.map((pattern) => {
            if (line.indexOf(pattern) !== -1) {
              console.log(file + ' contain `' + pattern + '` in in line "' + line +'";');
              result = 1;
            }
          });
        });
      }
    } catch (e) {
      console.log('Error:', e.stack);
    }
  });
  process.exit(result);

})();
Aniruddha Das
  • 20,520
  • 23
  • 96
  • 132
-1

I have looked through all above answers, all of them use third-party library to solve it. It's have a simple solution in Node's API. e.g

const fs= require('fs')

let stream = fs.createReadStream('<filename>', { autoClose: true })

stream.on('data', chunk => {
    let row = chunk.toString('ascii')
}))
mrcode
  • 11
  • 1
  • I guess the downvotes because this won't read the entire file at once, but how can you be sure each chunk ends with new line (\n)? The logic to verify and store partial lines isn't there. – YoniXw Dec 17 '20 at 16:00