229

I would like to read a very, very large file into a JavaScript array in node.js.

So, if the file is like this:

first line
two 
three
...
...

I would have the array:

['first line','two','three', ... , ... ] 

The function would look like this:

var array = load(filename); 

Therefore the idea of loading it all as a string and then splitting it is not acceptable.

Charles Merriam
  • 19,908
  • 6
  • 73
  • 83
chacko
  • 5,004
  • 9
  • 31
  • 39
  • 1
    This question needs some serious editing and cleanup. It says **read a text file into an array**, but when you read all the answers and comments, it really means **read a text file one line at a time**. For that question @zswang has the best answer so far. – Jess Nov 27 '13 at 03:46
  • yup just read that file and push each line into an array: http://stackoverflow.com/a/34033928/1536309 – Blair Anderson Dec 09 '15 at 21:17

15 Answers15

568

Synchronous:

var fs = require('fs');
var array = fs.readFileSync('file.txt').toString().split("\n");
for(i in array) {
    console.log(array[i]);
}

Asynchronous:

var fs = require('fs');
fs.readFile('file.txt', function(err, data) {
    if(err) throw err;
    var array = data.toString().split("\n");
    for(i in array) {
        console.log(array[i]);
    }
});
Finbarr
  • 31,350
  • 13
  • 63
  • 94
  • 12
    thanks. Unfortunately I had to edit my question. I mean how to read a massively large file. Reading it all in a string is not acceptable. – chacko Jul 26 '11 at 15:11
  • Maybe should not make i global and use for(var i in ... instead? – imns Oct 03 '13 at 01:58
  • 1
    Just what I needed. Simple and quick. – Hcabnettek Apr 30 '14 at 17:31
  • This is useful for some of us! – Indolering May 21 '14 at 03:23
  • Why the 'toString' instead of the option 'utf8', as in: fs.readFileSync('file.txt', 'utf8').split("\n"); – flaky Oct 11 '14 at 20:08
  • 25
    I found doing this on a file that was made by Windows, I had to split \r\n but that broke Macs; so a more robust; _array = string.replace(/\r\n/g,'\n').split('\n'); worked for both – Will Hancock May 05 '15 at 15:37
  • 9
    +1 There some problem in Stackoverflow. Now, I often find highly voted answers after scrolling down by too far. This is also an example of this. It has highest voting but positioned at the bottom of the page, very last. I think Stackoverflow needs to improve their ordering algorithm. – shashwat May 25 '15 at 12:24
  • Thank you @Finbarr. Your answer is pretty useful, and you provide both asynchronous and synchronous ways to achieve this. – lmiguelvargasf Jun 11 '15 at 01:29
  • Is there a reason why the console gets stuck after outputting all the lines? I am using the asynchronous method and am wanting something else to be output afterwards. However, it seems that the file-reading callback never releases itself or something like that. – MadPhysicist Jul 23 '16 at 17:26
  • 3
    @shashwat The person who asks the question gets to decide which is the right answer. In this case, they needed a streaming solution for large files and putting the entire file in a string is unacceptable. Nothing wrong with SO, really. – legalize Sep 15 '16 at 22:12
  • 1
    @shashwat possibly you're sorting answers by "active" rather than "votes"? – bearacuda13 Jul 09 '18 at 19:50
  • 5
    @WillHancock why not use `os.EOL` instead of that monstruosity? – Guido Tarsia Oct 06 '20 at 17:36
93

If you can fit the final data into an array then wouldn't you also be able to fit it in a string and split it, as has been suggested? In any case if you would like to process the file one line at a time you can also try something like this:

var fs = require('fs');

function readLines(input, func) {
  var remaining = '';

  input.on('data', function(data) {
    remaining += data;
    var index = remaining.indexOf('\n');
    while (index > -1) {
      var line = remaining.substring(0, index);
      remaining = remaining.substring(index + 1);
      func(line);
      index = remaining.indexOf('\n');
    }
  });

  input.on('end', function() {
    if (remaining.length > 0) {
      func(remaining);
    }
  });
}

function func(data) {
  console.log('Line: ' + data);
}

var input = fs.createReadStream('lines.txt');
readLines(input, func);

EDIT: (in response to comment by phopkins) I think (at least in newer versions) substring does not copy data but creates a special SlicedString object (from a quick glance at the v8 source code). In any case here is a modification that avoids the mentioned substring (tested on a file several megabytes worth of "All work and no play makes Jack a dull boy"):

function readLines(input, func) {
  var remaining = '';

  input.on('data', function(data) {
    remaining += data;
    var index = remaining.indexOf('\n');
    var last  = 0;
    while (index > -1) {
      var line = remaining.substring(last, index);
      last = index + 1;
      func(line);
      index = remaining.indexOf('\n', last);
    }

    remaining = remaining.substring(last);
  });

  input.on('end', function() {
    if (remaining.length > 0) {
      func(remaining);
    }
  });
}
mtomis
  • 1,366
  • 9
  • 9
  • Thanks. to answer your question: no, the string would be too large. – chacko Jul 26 '11 at 16:29
  • 9
    I tried this on files of around 2MB or so and it was painfully slow, much slower than reading in the files synchronously to a string. I think the issue is the remaining = remaining.substring line. Node's "data" might give you a lot at a time, and doing that copy for every line quickly becomes O(n^2). – Fiona Hopkins May 18 '12 at 02:08
  • 1
    @Finbar's answer is much better – rü- May 11 '18 at 08:27
83

Using the Node.js readline module.

var fs = require('fs');
var readline = require('readline');

var filename = process.argv[2];
readline.createInterface({
    input: fs.createReadStream(filename),
    terminal: false
}).on('line', function(line) {
   console.log('Line: ' + line);
});
Yves M.
  • 29,855
  • 23
  • 108
  • 144
zswang
  • 2,302
  • 3
  • 16
  • 13
  • 1
    Sadly there is a **problem** with this solution: You don't get last line if the file don't have a `\n` at the end! See: http://stackoverflow.com/questions/18450197/nodejs-readline-missing-last-line-of-file – Yves M. Mar 25 '15 at 13:44
  • 8
    Node has fixed that issue with the \n http://stackoverflow.com/a/32599033/3763850 – Gemtastic Oct 20 '15 at 10:38
  • No please help me once how? – VARUN Jun 24 '22 at 07:02
27

js:

var array = fs.readFileSync('file.txt', 'utf8').split('\n');

ts:

var array = fs.readFileSync('file.txt', 'utf8').toString().split('\n');
hojin
  • 1,221
  • 14
  • 16
  • 3
    To prevent the above to throw ```TypeError: fs.readFileSync(...).split is not a function```, you should use .toString() like this: ```var array = fs.readFileSync('file.txt', 'utf8').toString().split('\n');``` – Qua285 Mar 19 '20 at 10:51
14

Essentially this will do the job: .replace(/\r\n/g,'\n').split('\n'). This works on Mac, Linux & Windows.

Code Snippets

Synchronous:

const { readFileSync } = require('fs');

const array = readFileSync('file.txt').toString().replace(/\r\n/g,'\n').split('\n');

for(let i of array) {
    console.log(i);
}

Asynchronous:

With the fs.promises API that provides an alternative set of asynchronous file system methods that return Promise objects rather than using callbacks. (No need to promisify, you can use async-await with this too, available on and after Node.js version 10.0.0)

const { readFile } = require('fs').promises;

readFile('file.txt', function(err, data) {
    if(err) throw err;

    const arr = data.toString().replace(/\r\n/g,'\n').split('\n');

    for(let i of arr) {
        console.log(i);
    }
});

More about \r & \n here: \r\n, \r and \n what is the difference between them?

MiKr13
  • 1,297
  • 13
  • 20
12

use readline (documentation). here's an example reading a css file, parsing for icons and writing them to json

var results = [];
  var rl = require('readline').createInterface({
    input: require('fs').createReadStream('./assets/stylesheets/_icons.scss')
  });


  // for every new line, if it matches the regex, add it to an array
  // this is ugly regex :)
  rl.on('line', function (line) {
    var re = /\.icon-icon.*:/;
    var match;
    if ((match = re.exec(line)) !== null) {
      results.push(match[0].replace(".",'').replace(":",''));
    }
  });


  // readline emits a close event when the file is read.
  rl.on('close', function(){
    var outputFilename = './icons.json';
    fs.writeFile(outputFilename, JSON.stringify(results, null, 2), function(err) {
        if(err) {
          console.log(err);
        } else {
          console.log("JSON saved to " + outputFilename);
        }
    });
  });
Blair Anderson
  • 19,463
  • 8
  • 77
  • 114
6

file.lines with my JFile package

Pseudo

var JFile=require('jfile');

var myF=new JFile("./data.txt");
myF.lines // ["first line","second line"] ....

Don't forget before :

npm install jfile --save
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Abdennour TOUMI
  • 87,526
  • 38
  • 249
  • 254
5

With a BufferedReader, but the function should be asynchronous:

var load = function (file, cb){
    var lines = [];
    new BufferedReader (file, { encoding: "utf8" })
        .on ("error", function (error){
            cb (error, null);
        })
        .on ("line", function (line){
            lines.push (line);
        })
        .on ("end", function (){
            cb (null, lines);
        })
        .read ();
};

load ("file", function (error, lines){
    if (error) return console.log (error);
    console.log (lines);
});
Gabriel Llamas
  • 18,244
  • 26
  • 87
  • 112
5

To read a big file into array you can read line by line or chunk by chunk.

line by line refer to my answer here

var fs = require('fs'),
    es = require('event-stream'),

var lines = [];

var s = fs.createReadStream('filepath')
    .pipe(es.split())
    .pipe(es.mapSync(function(line) {
        //pause the readstream
        s.pause();
        lines.push(line);
        s.resume();
    })
    .on('error', function(err) {
        console.log('Error:', err);
    })
    .on('end', function() {
        console.log('Finish reading.');
        console.log(lines);
    })
);

chunk by chunk refer to this article

var offset = 0;
var chunkSize = 2048;
var chunkBuffer = new Buffer(chunkSize);
var fp = fs.openSync('filepath', 'r');
var bytesRead = 0;
while(bytesRead = fs.readSync(fp, chunkBuffer, 0, chunkSize, offset)) {
    offset += bytesRead;
    var str = chunkBuffer.slice(0, bytesRead).toString();
    var arr = str.split('\n');

    if(bytesRead = chunkSize) {
        // the last item of the arr may be not a full line, leave it to the next chunk
        offset -= arr.pop().length;
    }
    lines.push(arr);
}
console.log(lines);
LF00
  • 27,015
  • 29
  • 156
  • 295
4

This is a variation on the answer above by @mtomis.

It creates a stream of lines. It emits 'data' and 'end' events, allowing you to handle the end of the stream.

var events = require('events');

var LineStream = function (input) {
    var remaining = '';

    input.on('data', function (data) {
        remaining += data;
        var index = remaining.indexOf('\n');
        var last = 0;
        while (index > -1) {
            var line = remaining.substring(last, index);
            last = index + 1;
            this.emit('data', line);
            index = remaining.indexOf('\n', last);
        }
        remaining = remaining.substring(last);
    }.bind(this));

    input.on('end', function() {
        if (remaining.length > 0) {
            this.emit('data', remaining);
        }
        this.emit('end');
    }.bind(this));
}

LineStream.prototype = new events.EventEmitter;

Use it as a wrapper:

var lineInput = new LineStream(input);

lineInput.on('data', function (line) {
    // handle line
});

lineInput.on('end', function() {
    // wrap it up
});
oferei
  • 1,610
  • 2
  • 19
  • 27
  • 1
    You will end with having events shared between instances. `var EventEmitter = require('events').EventEmitter; var util = require('util'); function GoodEmitter() { EventEmitter.call(this); } util.inherits(GoodEmitter, EventEmitter);` – TrejGun May 25 '14 at 08:55
  • What instances are you talking about exactly? – oferei May 26 '14 at 20:54
  • 1
    try to create `var li1 = new LineStream(input1), li2 = new LineStream(input2);` then count how many times 'end' is fired for each one – TrejGun May 29 '14 at 16:32
  • tried it. 'end' was fired once for each instance. `var fs = require('fs'); var input1 = fs.createReadStream('text.txt'); var ls1 = new LineStream(input1); ls1.on('data', function (line) { console.log('1:line=' + line); }); ls1.on('end', function (line) { console.log('1:fin'); }); var input2 = fs.createReadStream('text.txt'); var ls2 = new LineStream(input2); ls2.on('data', function (line) { console.log('2:line=' + line); }); ls2.on('end', function (line) { console.log('2:fin'); }); ` output: each line in the text file was fired once for each instance. so was 'end'. – oferei Jun 01 '14 at 07:41
4

i just want to add @finbarr great answer, a little fix in the asynchronous example:

Asynchronous:

var fs = require('fs');
fs.readFile('file.txt', function(err, data) {
    if(err) throw err;
    var array = data.toString().split("\n");
    for(i in array) {
        console.log(array[i]);
    }
    done();
});

@MadPhysicist, done() is what releases the async. call.

HernanFila
  • 66
  • 5
4

Using Node.js v8 or later has a new feature that converts normal function into an async function.

util.promisify

It's an awesome feature. Here's the example of parsing 10000 numbers from the txt file into an array, counting inversions using merge sort on the numbers.

// read from txt file
const util = require('util');
const fs = require('fs')
fs.readFileAsync = util.promisify(fs.readFile);
let result = []

const parseTxt = async (csvFile) => {
  let fields, obj
  const data = await fs.readFileAsync(csvFile)
  const str = data.toString()
  const lines = str.split('\r\n')
  // const lines = str
  console.log("lines", lines)
  // console.log("str", str)

  lines.map(line => {
    if(!line) {return null}
    result.push(Number(line))
  })
  console.log("result",result)
  return result
}
parseTxt('./count-inversion.txt').then(() => {
  console.log(mergeSort({arr: result, count: 0}))
})
3

I had the same problem, and I have solved it with the module line-by-line

https://www.npmjs.com/package/line-by-line

At least for me works like a charm, both in synchronous and asynchronous mode.

Also, the problem with lines terminating not terminating \n can be solved with the option:

{ encoding: 'utf8', skipEmptyLines: false }

Synchronous processing of lines:

var LineByLineReader = require('line-by-line'),
    lr = new LineByLineReader('big_file.txt');

lr.on('error', function (err) {
    // 'err' contains error object
});

lr.on('line', function (line) {
    // 'line' contains the current line without the trailing newline character.
});

lr.on('end', function () {
    // All lines are read, file is closed now.
}); 
Antoni
  • 2,542
  • 20
  • 21
0

Another answer using an npm package. The nexline package allows one to asynchronously read a file line-by-line:

"use strict";

import fs from 'fs';
import nexline from 'nexline';

const lines = [];
const reader = nexline({
    input: fs.createReadStream(`path/to/file.ext`)
});

while(true) {
    const line = await reader.next();
    if(line === null) break; // line is null if we reach the end
    if(line.length === 0) continue; // Ignore empty lines
    
    // Process the line here - below is just an example
    lines.push(line);
}

This approach will work even if your text file is larger than the maximum allowed string length, thereby avoiding the Error: Cannot create a string longer than 0x1fffffe8 characters error.

starbeamrainbowlabs
  • 5,692
  • 8
  • 42
  • 73
0

To put each line as an item inside an array, a new function was added in Node.js v18.11.0 to read files line by line

  • filehandle.readLines([options])

This is how you use this with a text file you want to read a file and put each line in an array

import { open } from 'node:fs/promises';
const arr = [];
myFilereader();
async function myFileReader() {
    const file = await open('./TextFileName.txt');
    for await (const line of file.readLines()) {
        arr.push(line);
    }
    console.log(arr)
}

To understand more read Node.js documentation here is the link for file system readlines(): https://nodejs.org/api/fs.html#filehandlereadlinesoptions

Larry
  • 401
  • 2
  • 6