Node.js: Count the number of lines in a file

Question

I have large text files, which range between 30MB and 10GB. How can I count the number of lines in a file using Node.js?

I have these limitations:

The entire file does not need to be written to memory
A child process is not required to perform the task

"using NodeJS" -- any real technical reason behind this requirement? — zerkms, Sep 17 '12 at 04:20
I'm sure that `wc` will be faster that any "native" nodejs solution — zerkms, Sep 17 '12 at 04:21
You could just count the lines-- http://stackoverflow.com/questions/6156501/read-a-file-one-line-at-a-time-in-node-js — JoshRagem, Sep 17 '12 at 04:41
Go with `wc` if you are running on a Linux server and `streams` if you want to be cross platform. — pouya, Dec 05 '18 at 12:30
Be careful with `wc` as it will NOT count the very last line unless it has the EOL character at the end. Use `grep -c "" filename` instead. — codemonkey, Sep 27 '20 at 21:35

Andrey Sidorov · Accepted Answer · 2012-09-17T05:31:42.477

46

solution without using wc:

var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    for (i=0; i < chunk.length; ++i)
      if (chunk[i] == 10) count++;
  })
  .on('end', function() {
    console.log(count);
  });

it's slower, but not that much you might expect - 0.6s for 140M+ file including node.js loading & startup time

>time node countlines.js video.mp4 
619643

real    0m0.614s
user    0m0.489s
sys 0m0.132s

>time wc -l video.mp4 
619643 video.mp4
real    0m0.133s
user    0m0.108s
sys 0m0.024s

>wc -c video.mp4
144681406  video.mp4

edited Sep 17 '12 at 05:31

answered Sep 17 '12 at 05:16

Andrey Sidorov

24,905
4
62
75

3

Your benchmark isn't very convincing since you're running it on a file that is *not* structured into lines and as such is not representative of the sort of file the OP wants to process. The line `if (chunk[i] == 10) count++;` will be executed far more often during the analysis of a text file than during the analysis of a binary video file. – ebohlman Sep 18 '12 at 07:14
1

I don't have 100mb text file :) And I don't expect any difference even in the case of similar 100mb text file but with 10x number of newlines - it's same linear search iterating every byte in each of Buffer chunks – Andrey Sidorov Sep 18 '12 at 07:43
I replicated input script itself and concatenated it to a single text file, 1468750000 chars, 62500000 lines. WC time: 0m1.375s, node.js time: 0m6.254s. Same 4.5 times difference (wich could be better, but still good enough for JS vs C program) – Andrey Sidorov Sep 18 '12 at 07:58
I have created NPM package that does just this. https://www.npmjs.com/package/count-lines-in-file – Gajus May 08 '16 at 13:15
2

Excuse my innocence but what does "chunk[i] == 10" mean ? I guess that if the chunk is equals to 10 it's a new line, but why compare to the number 10 ? – Ashbay Sep 28 '17 at 19:18
8

10 is ascii code for "New Line" character. For better readability you could have few lines earlier `const LINE_FEED = '\n'.charCodeAt(0)` and then `if (chunk[i] == LINE_FEED) count++` – Andrey Sidorov Oct 02 '17 at 00:26
3

Your implementation is off by one. For example, if your file has 2 lines, then it only has 1 `newline`, so your script will log `1`. – Benjamin Mar 07 '19 at 23:42
Depends - on unix a text file ought to end with a newline. I guess the program should check whether the last character is a newline and then adjust accordingly. – nsandersen Jan 06 '20 at 17:09

score 34 · Answer 2 · answered Jan 03 '17 at 09:10

34

We can use indexOf to let the VM find the newlines:

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
  let lineCount = 0;
  fs.createReadStream(filePath)
    .on("data", (buffer) => {
      let idx = -1;
      lineCount--; // Because the loop will run once for idx=-1
      do {
        idx = buffer.indexOf(10, idx+1);
        lineCount++;
      } while (idx !== -1);
    }).on("end", () => {
      resolve(lineCount);
    }).on("error", reject);
  });
};

What this solution does is that it finds the position of the first newline using .indexOf. It increments lineCount, then it finds the next position. The second parameter to .indexOf tells where to start looking for newlines. This way we are jumping over large chunks of the buffer. The while loop will run once for every newline, plus one.

We are letting the Node runtime do the searching for us which is implemented on a lower level and should be faster.

On my system this is about twice as fast as running a for loop over the buffer length on a large file (111 MB).

answered Jan 03 '17 at 09:10

Emil Vikström

90,431
16
141
175

3

This is the best solution compared to others showed here! – loretoparisi Oct 03 '17 at 14:36
This answer is just amazing. Should be on the top. Took only 200 milliseconds to count lines on a 200MB file. – m4heshd Apr 07 '18 at 01:10
I can gladly confirm that this approach performs very well also with very large files (16GB). – Luca Fagioli Jun 03 '20 at 08:30
If someone could explain how this works starting at line `let idx = -1` that would be awesome. Thank you! – philosopher Nov 21 '20 at 11:42
@philosopher It does `idx+1` when being passed into `.indexOf()`, so the first value that one will see is 0, not -1. – Alexander Klimetschek Nov 03 '22 at 23:01

score 30 · Answer 3 · edited Aug 23 '22 at 16:50

30

You could do this as the comments suggest using wc

var exec = require('child_process').exec;

exec('wc -l /path/to/file', function (error, results) {
    console.log(results);
});

edited Aug 23 '22 at 16:50

Michael Mior

28,107
9
89
113

answered Sep 17 '12 at 04:29

Menztrual

40,867
12
57
70

16

`wc` is a bash specific command and might not work in a windows environment for example – Renaud Nov 21 '14 at 13:43
2

`wc -l` to only count the number of lines – Yves M. Mar 30 '15 at 08:33
1

`wc -l path/to/file` will give number of lines along with filename. To get only number of lines use `wc -l < path/to/file` – Sarita Oct 21 '15 at 07:36
3

If you like to do it this way, try `sed -n '$=' /path/to/file` It will only return the number of lines, you can apply the function `parseInt` on it to get your number. – Jean-Baptiste Louazel Dec 23 '15 at 16:31
2

`parseInt(execSync('wc -l < /path/to/file').toString().trim())` – Felipe Zavan Oct 02 '18 at 02:32
`wc` will not count the last line unless it has the EOL character at the end. Use `grep -c "" filename` instead – codemonkey Sep 27 '20 at 21:23

score 6 · Answer 4 · answered Jun 07 '16 at 21:54

6

var fs=require('fs');
filename=process.argv[2];
var data=fs.readFileSync(filename);
var res=data.toString().split('\n').length;
console.log(res-1);`

answered Jun 07 '16 at 21:54

ruchi gupta

93
1
1

6

While this code snippet may solve the question, [including an explanation](https://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – Box Box Box Box Jun 08 '16 at 03:48
3

This solution requires loading the file in memory. I would advise against it. The answer using `wc` doesn't because `wc` is optimized to stream the file. – Thalis K. Jun 27 '17 at 14:01
2

The answer also doesn't add anything valuable compared to [Alan Viars](https://stackoverflow.com/a/32286822/238978) who posted the same thing a year before. – Emil Vikström Dec 05 '17 at 11:03
2

The question specifically states that the files range from 30MB to 10GB. This solution reads the entire file into memory before processing. This would likely cause the code to crash because JavaScript would run out of memory – thekenobe May 13 '19 at 18:23
your code would not be able to handle larger file size. – Rahul Gupta Jan 17 '23 at 06:13

score 5 · Answer 5 · answered Mar 27 '15 at 00:47

since iojs 1.5.0 there is Buffer#indexOf() method, using it to compare to Andrey Sidorov' answer:

ubuntu@server:~$ wc logs
  7342500  27548750 427155000 logs
ubuntu@server:~$ time wc -l logs 
7342500 logs

real    0m0.180s
user    0m0.088s
sys 0m0.084s
ubuntu@server:~$ nvm use node
Now using node v0.12.1
ubuntu@server:~$ time node countlines.js logs 
7342500

real    0m2.559s
user    0m2.200s
sys 0m0.340s
ubuntu@server:~$ nvm use iojs
Now using node iojs-v1.6.2
ubuntu@server:~$ time iojs countlines2.js logs 
7342500

real    0m1.363s
user    0m0.920s
sys 0m0.424s
ubuntu@server:~$ cat countlines.js 
var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    for (i=0; i < chunk.length; ++i)
      if (chunk[i] == 10) count++;
  })
  .on('end', function() {
    console.log(count);
  });
ubuntu@server:~$ cat countlines2.js 
var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    var index = -1;
    while((index = chunk.indexOf(10, index + 1)) > -1) count++
  })
  .on('end', function() {
    console.log(count);
  });
ubuntu@server:~$

score 4 · Answer 6 · answered Feb 28 '19 at 02:04

If you use Node 8 and above, you can use this async/await pattern

const util = require('util');
const exec = util.promisify(require('child_process').exec);

async function fileLineCount({ fileLocation }) {
  const { stdout } = await exec(`cat ${fileLocation} | wc -l`);
  return parseInt(stdout);
};

// Usage

async someFunction() {
  const lineCount = await fileLineCount({ fileLocation: 'some/file.json' });
}

score 3 · Answer 7 · answered Aug 29 '15 at 13:57

3

Here is another way without so much nesting.

var fs = require('fs');
filePath = process.argv[2];
fileBuffer =  fs.readFileSync(filePath);
to_string = fileBuffer.toString();
split_lines = to_string.split("\n");
console.log(split_lines.length-1);

answered Aug 29 '15 at 13:57

Alan Viars

3,112
31
14

7

For a 10gb file, this is not very performant, to say the least. – Gian Franco Zabarino Feb 05 '18 at 02:35
This is a simple and good, but only for small files! If files is a 10 GB file, Script will die. – Krunal Panchal Jun 29 '18 at 06:52

David Dombrowsky · Answer 8 · 2019-03-22T14:49:41.647

3

Best solution I've found is using promises, async, and await. This is also an example of how await for the fulfillment of a promise:

#!/usr/bin/env node
const fs = require('fs');
const readline = require('readline');
function main() {
    function doRead() {
        return new Promise(resolve => {
            var inf = readline.createInterface({
                input: fs.createReadStream('async.js'),
                crlfDelay: Infinity
            });
            var count = 0;
            inf.on('line', (line) => {
                console.log(count + ' ' + line);
                count += 1;
            });
            inf.on('close', () => resolve(count));
        });
    }
    async function showRead() {
        var x = await doRead();
        console.log('line count: ' + x);
    }
    showRead();
}
main();

edited Mar 22 '19 at 14:49

answered Apr 25 '18 at 19:52

David Dombrowsky

1,655
3
20
32

1

It's incorrect to say that you can turn an async function into a synchronous function. Your top-level main function needs to be `async` so that it can call `await` on `showRead()`. The only reason you get an apparent confirmation of your statement is because the NodeJs event loop is waiting for the IO phase to complete, and the program won't terminate until then. If you add a logging statement right below `showRead()` it would execute immediately – Felipe Mar 20 '19 at 19:15
1

Correct. This was more an example of how to use `await` to wait for the fulfillment of a promise. Poor choice of words on my part. I will fix that. – David Dombrowsky Mar 22 '19 at 14:48

score 1 · Answer 9 · answered Nov 25 '15 at 19:22

1

You can also use indexOf():

var index = -1;
var count = 0;
while ((index = chunk.indexOf(10, index + 1)) > -1) count++;

answered Nov 25 '15 at 19:22

Jeff Kilbride

2,614
1
20
21

score 1 · Answer 10 · answered Oct 07 '22 at 04:32

Simple solution using readline:

import readline from 'node:readline';

export default async function countLines(input) {
    let lineCount = 0;

    for await (const _ of readline.createInterface({input, crlfDelay: Infinity})) {
        lineCount++;
    }

    return lineCount;
}

import fs from 'node:fs';

console.log(await countLines(fs.createReadStream('file.txt')));
//=> <number>

Node.js: Count the number of lines in a file

10 Answers10

Linked

Related