178

With nodejs I want to parse a .csv file of 10000 records and do some operation on each row. I tried using http://www.adaltas.com/projects/node-csv. I couldnt get this to pause at each row. This just reads through all the 10000 records. I need to do the following:

  1. read csv line by line
  2. perform time consuming operation on each line
  3. go to the next line

Can anyone please suggest any alternative ideas here?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
lonelymo
  • 3,972
  • 6
  • 28
  • 36

19 Answers19

111

Seems like you need to use a stream-based library such as fast-csv, which also includes validation support.

NB! As the fast-csv package is not actively maintained suggesting looking into some other alternative like csv.

Risto Novik
  • 8,199
  • 9
  • 50
  • 66
95

I used this way:-

var fs = require('fs'); 
var parse = require('csv-parse');

var csvData=[];
fs.createReadStream(req.file.path)
    .pipe(parse({delimiter: ':'}))
    .on('data', function(csvrow) {
        console.log(csvrow);
        //do something with csvrow
        csvData.push(csvrow);        
    })
    .on('end',function() {
      //do something with csvData
      console.log(csvData);
    });
Kuchi
  • 4,204
  • 3
  • 29
  • 37
vineet
  • 13,832
  • 10
  • 56
  • 76
  • 5
    I may be doing something wrong, but when I run this, `parse` isn't defined. Is there something I'm missing? When I run `npm install csv-parse` and then in my code add `var parse = require("csv-parse");`, then it works. Are you sure yours works? Either way, I love this solution (even if I have to include the `csv-parse` module – Ian May 18 '16 at 14:47
  • 1
    you are right @lan, it should be include `csv-parse` module. – vineet May 19 '16 at 05:02
  • 1
    Awesome, thank you for verifying and updating your answer! – Ian Sep 20 '16 at 13:51
  • 4
    Nice solution. Works for me. – Sun Bee Oct 13 '16 at 08:44
  • 4
    sadly this is bad - i got errors with huge files and long lines.... (memory errors - though other ways of reading it - works) – Seti Dec 05 '16 at 13:30
  • 1
    Clean and Simple. Thanks a ton! – Shwetabh Shekhar Apr 12 '18 at 08:34
  • @Seti This isn't a bad answer, you just should understand that using this answer in conjunction with a large dataset will crash your application. That's just common sense, no offense. – Toby Caulk Aug 07 '18 at 20:00
  • @TobyCaulk sorry for long answer, but no - its not common sence, i can imagine junior which got test data set, got this answer and pushed it into production then ton of data crashed application on production - thats why i noted that its bad answer. – Seti Apr 05 '19 at 07:40
  • 3
    Sadly this adds to the array but can only be accessed via the .on(‘end’), not outside this statement. This I think is due to it being synchronous. Asyncronus functionality needs explaining. – A McGuinness Apr 07 '19 at 11:17
  • This doesn't work. – BitShift Jan 14 '22 at 09:56
  • 7
    Currently, parse is a named export var { parse } = require("csv-parse"); – Abhishek E H Feb 26 '22 at 06:40
  • Worked for me after changing the delimiter to comma: `.pipe(parse({delimiter: ','}))` – Daniel James Apr 05 '23 at 04:15
  • Worked for me with version `"csv-parse": "5.4.0"` after doing this: first import `const parser = require("csv-parse");` then use parser `.pipe(parser.parse({delimiter: ";"}))` – Theodosios Asvestopoulos Jun 26 '23 at 14:37
59

My current solution uses the async module to execute in series:

var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');

var inputFile='myfile.csv';

var parser = parse({delimiter: ','}, function (err, data) {
  async.eachSeries(data, function (line, callback) {
    // do something with the line
    doSomething(line).then(function() {
      // when processing finishes invoke the callback to move to the next one
      callback();
    });
  })
});
fs.createReadStream(inputFile).pipe(parser);
Abdul Hameed
  • 1,008
  • 1
  • 15
  • 35
prule
  • 2,536
  • 31
  • 32
  • 1
    I think you miss some ')' ? – Steve Lng C Jul 11 '16 at 10:13
  • I think adding a ')' to the end of lines 14 and 15 should fix the problem. – Jon Jul 18 '16 at 09:07
  • @ShashankVivek - in this old answer (from 2015), 'async' is an npm library that is used. More about it here https://caolan.github.io/async/ - to understand why maybe this helps https://blog.risingstack.com/node-hero-async-programming-in-node-js/ But javascript has evolved a lot since 2015, and if your question is more about async in general, then read this more recent article https://medium.com/@tkssharma/writing-neat-asynchronous-node-js-code-with-promises-async-await-fa8d8b0bcd7c – prule Aug 07 '18 at 22:01
39
  • This solution uses csv-parser instead of csv-parse used in some of the answers above.
  • csv-parser came around 2 years after csv-parse.
  • Both of them solve the same purpose, but personally I have found csv-parser better, as it is easy to handle headers through it.

Install the csv-parser first:

npm install csv-parser

So suppose you have a csv-file like this:

NAME, AGE
Lionel Messi, 31
Andres Iniesta, 34

You can perform the required operation as:

const fs = require('fs'); 
const csv = require('csv-parser');

fs.createReadStream(inputFilePath)
.pipe(csv())
.on('data', function(data){
    try {
        console.log("Name is: "+data.NAME);
        console.log("Age is: "+data.AGE);

        //perform the operation
    }
    catch(err) {
        //error handler
    }
})
.on('end',function(){
    //some final operation
});  

For further reading refer

Another advantage, as mentioned in the comment, of using csv-parser instead of csv-parse is:

csv-parser is about 27KB while csv-parse is 1.6MB

Pransh Tiwari
  • 3,983
  • 1
  • 32
  • 43
  • 3
    Thanks for sharing. One of the biggest benefits of `csv-parser` is the size of the package. `csv-parser` is about 27KB while `csv-parse` is 1.6MB. – Amin Mousavi Dec 15 '21 at 08:56
  • 2
    `csv-parser` works best for me specially getting it in data format – Salem Apr 28 '22 at 20:47
  • `csv-parser` hasn't seen an update for over two years now. would not recommend. – choise Dec 08 '22 at 10:40
20

In order to pause the streaming in fast-csv you can do the following:

let csvstream = csv.fromPath(filePath, { headers: true })
    .on("data", function (row) {
        csvstream.pause();
        // do some heavy work
        // when done resume the stream
        csvstream.resume();
    })
    .on("end", function () {
        console.log("We are done!")
    })
    .on("error", function (error) {
        console.log(error)
    });
adnan kamili
  • 8,967
  • 7
  • 65
  • 125
  • 1
    csvstream.pause() and resume() is what I've been looking for! My applications would always run out of memory because it read data much faster than what it could process. – ehrhardt Jan 18 '18 at 06:51
  • 1
    @adnan Thanks for pointing this out. It is not mentioned in the documentation and that's what I was also looking for. – Piyush Beli Jan 05 '19 at 15:34
12

The node-csv project that you are referencing is completely sufficient for the task of transforming each row of a large portion of CSV data, from the docs at: http://csv.adaltas.com/transform/:

csv()
  .from('82,Preisner,Zbigniew\n94,Gainsbourg,Serge')
  .to(console.log)
  .transform(function(row, index, callback){
    process.nextTick(function(){
      callback(null, row.reverse());
    });
});

From my experience, I can say that it is also a rather fast implementation, I have been working with it on data sets with near 10k records and the processing times were at a reasonable tens-of-milliseconds level for the whole set.

Rearding jurka's stream based solution suggestion: node-csv IS stream based and follows the Node.js' streaming API.

krwck
  • 131
  • 1
  • 4
11

The fast-csv npm module can read data line-by-line from csv file.

Here is an example:

let csv= require('fast-csv');

var stream = fs.createReadStream("my.csv");

csv
 .parseStream(stream, {headers : true})
 .on("data", function(data){
     console.log('I am one line of data', data);
 })
 .on("end", function(){
     console.log("done");
 });
whoami - fakeFaceTrueSoul
  • 17,086
  • 6
  • 32
  • 46
10
var fs = require("fs");
// READ CSV INTO STRING
var data = fs.readFileSync("your.csv").toLocaleString();

// STRING TO ARRAY
var rows = data.split("\n"); // SPLIT ROWS
rows.forEach((row) => {
    columns = row.split(","); //SPLIT COLUMNS
    console.log(columns);
})
Hamlet Kraskian
  • 683
  • 9
  • 11
  • 4
    Reading a whole file into memory is usually a bad idea, splitting it afterwards is even worse; now you have double the file size in memory. – Michel Jung Oct 15 '21 at 12:16
  • parsing CSV is not just splitting by comma, usually this brings to bugs and headackes. Valid CSV lines are like "this is, a text, with comma inside", "another field", 123; – Fabrizio Regini Apr 21 '22 at 13:03
9

I needed an async csv reader and originally tried @Pransh Tiwari's answer but couldn't get it working with await and util.promisify(). Eventually I came across node-csvtojson, which pretty much does the same as csv-parser, but with promises. Here is an example usage of csvtojson in action:

const csvToJson = require('csvtojson');

const processRecipients = async () => {
    const recipients = await csvToJson({
        trim:true
    }).fromFile('./recipients.csv');

    // Code executes after recipients are fully loaded.
    recipients.forEach((recipient) => {
        console.log(recipient.name, recipient.email);
    });
};
alexkb
  • 3,216
  • 2
  • 30
  • 30
4

I use this simple one: https://www.npmjs.com/package/csv-parser

Very simple to use:

const csv = require('csv-parser')
const fs = require('fs')
const results = [];

fs.createReadStream('./CSVs/Update 20191103C.csv')
  .pipe(csv())
  .on('data', (data) => results.push(data))
  .on('end', () => {
    console.log(results);
    console.log(results[0]['Lowest Selling Price'])
  });
Xin
  • 33,823
  • 14
  • 84
  • 85
4

Ok so there are many answers here and I dont think they answer your question which I think is similar to mine.

You need to do an operation like contacting a database or third part api that will take time and is asyncronus. You do not want to load the entire document into memory due to being to large or some other reason so you need to read line by line to process.

I have read into the fs documents and it can pause on reading but using .on('data') call will make it continous which most of these answer use and cause the problem.


UPDATE: I know more info about Streams than I ever wanted

The best way to do this is to create a writable stream. This will pipe the csv data into your writable stream which you can manage asyncronus calls. The pipe will manage the buffer all the way back to the reader so you will not wind up with heavy memory usage

Simple Version

const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')

const mySimpleWritable = new stream.Writable({
  objectMode: true, // Because input is object from csv-parser
  write(chunk, encoding, done) { // Required
    // chunk is object with data from a line in the csv
    console.log('chunk', chunk)
    done();
  },
  final(done) { // Optional
    // last place to clean up when done
    done();
  }
});
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(mySimpleWritable)

Class Version

const parser = require('csv-parser');
const stripBom = require('strip-bom-stream');
const stream = require('stream')
// Create writable class
class MyWritable extends stream.Writable {
  // Used to set object mode because we get an object piped in from csv-parser
  constructor(another_variable, options) {
    // Calls the stream.Writable() constructor.
    super({ ...options, objectMode: true });
    // additional information if you want
    this.another_variable = another_variable
  }
  // The write method
  // Called over and over, for each line in the csv
  async _write(chunk, encoding, done) {
    // The chunk will be a line of your csv as an object
    console.log('Chunk Data', this.another_variable, chunk)

    // demonstrate await call
    // This will pause the process until it is finished
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Very important to add.  Keeps the pipe buffers correct.  Will load the next line of data
    done();
  };
  // Gets called when all lines have been read
  async _final(done) {
    // Can do more calls here with left over information in the class
    console.log('clean up')
    // lets pipe know its done and the .on('final') will be called
    done()
  }
}

// Instantiate the new writable class
myWritable = new MyWritable(somevariable)
// Pipe the read stream to csv-parser, then to your write class
// stripBom is due to Excel saving csv files with UTF8 - BOM format
fs.createReadStream(fileNameFull).pipe(stripBom()).pipe(parser()).pipe(myWritable)

// optional
.on('finish', () => {
  // will be called after the wriables internal _final
  console.log('Called very last')
})

OLD METHOD:

PROBLEM WITH readable

const csv = require('csv-parser');
const fs = require('fs');

const processFileByLine = async(fileNameFull) => {

  let reading = false

  const rr = fs.createReadStream(fileNameFull)
  .pipe(csv())

  // Magic happens here
  rr.on('readable', async function(){
    // Called once when data starts flowing
    console.log('starting readable')

    // Found this might be called a second time for some reason
    // This will stop that event from happening
    if (reading) {
      console.log('ignoring reading')
      return
    }
    reading = true
    
    while (null !== (data = rr.read())) {
      // data variable will be an object with information from the line it read
      // PROCESS DATA HERE
      console.log('new line of data', data)
    }

    // All lines have been read and file is done.
    // End event will be called about now so that code will run before below code

    console.log('Finished readable')
  })


  rr.on("end", function () {
    // File has finished being read
    console.log('closing file')
  });

  rr.on("error", err => {
    // Some basic error handling for fs error events
    console.log('error', err);
  });
}

You will notice a reading flag. I have noticed that for some reason right near the end of the file the .on('readable') gets called a second time on small and large files. I am unsure why but this blocks that from a second process reading the same line items.

BrinkDaDrink
  • 1,717
  • 2
  • 23
  • 32
3

this is my solution to get csv file from external url

const parse = require( 'csv-parse/lib/sync' );
const axios = require( 'axios' );
const readCSV = ( module.exports.readCSV = async ( path ) => {
try {
   const res = await axios( { url: path, method: 'GET', responseType: 'blob' } );
   let records = parse( res.data, {
      columns: true,
      skip_empty_lines: true
    } );

    return records;
 } catch ( e ) {
   console.log( 'err' );
 }

} );
readCSV('https://urltofilecsv');
Andrea Perdicchia
  • 2,786
  • 1
  • 20
  • 19
  • `const parse = require( 'csv-parse/lib/sync' );` didn't work for me but `const parse = require( 'csv-parse/sync' ).parse;` after looking into package.json file of the module – Ijaz Ur Rahim May 18 '23 at 23:34
2

I was using csv-parse but for larger files was running into performance issues one of the better libraries I have found is Papa Parse, docs are good, good support, lightweight, no dependencies.

Install papaparse

npm install papaparse

Usage:

  • async / await
const fs = require('fs');
const Papa = require('papaparse');

const csvFilePath = 'data/test.csv'

// Function to read csv which returns a promise so you can do async / await.

const readCSV = async (filePath) => {
  const csvFile = fs.readFileSync(filePath)
  const csvData = csvFile.toString()  
  return new Promise(resolve => {
    Papa.parse(csvData, {
      header: true,
      transformHeader: header => header.trim(),
      complete: results => {
        console.log('Complete', results.data.length, 'records.'); 
        resolve(results.data);
      }
    });
  });
};

const test = async () => {
  let parsedData = await readCSV(csvFilePath); 
}

test()
  • callback
const fs = require('fs');
const Papa = require('papaparse');

const csvFilePath = 'data/test.csv'

const file = fs.createReadStream(csvFilePath);

var csvData=[];
Papa.parse(file, {
  header: true,
  transformHeader: header => header.trim(),
  step: function(result) {
    csvData.push(result.data)
  },
  complete: function(results, file) {
    console.log('Complete', csvData.length, 'records.'); 
  }
});

Note header: true is an option on the config, see docs for other options

Glen Thompson
  • 9,071
  • 4
  • 54
  • 50
1

Try the line by line npm plugin.

npm install line-by-line --save
josliber
  • 43,891
  • 12
  • 98
  • 133
nickast
  • 63
  • 1
  • 6
  • 6
    Installing a plugin wasn't the question that was asked. Adding some code to explain how to use the plugin and/or explain why the OP should use it would be *far* more beneficial. – domdambrogia Apr 27 '18 at 17:52
1

I've done this using promise approach

const fs = require('fs')
const {parse} = require('csv-parse')
function readFile(path){
    return new Promise((resolve,reject)=>{
        fs.readFile(path, function (err, fileData) {
            parse(fileData, {columns: false, trim: true}, async function(err, rows) {
                if(err){
                    reject(err)
                }
                resolve(rows)
            })
          })
    })
}
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 03 '22 at 02:10
0
fs = require('fs');
fs.readFile('FILENAME WITH PATH','utf8', function(err,content){
if(err){
    console.log('error occured ' +JSON.stringify(err));
 }
 console.log('Fileconetent are ' + JSON.stringify(content));
})
swapnil
  • 11
  • 3
0

You can convert csv to json format using csv-to-json module and then you can easily use json file in your program

Anuj Kumar
  • 17
  • 2
0

csv-parse currently supports Async Iterators which should fit your use case nicely

Adwin Ang
  • 79
  • 4
-2

npm install csv

Sample CSV file You're going to need a CSV file to parse, so either you have one already, or you can copy the text below and paste it into a new file and call that file "mycsv.csv"

ABC, 123, Fudge
532, CWE, ICECREAM
8023, POOP, DOGS
441, CHEESE, CARMEL
221, ABC, HOUSE
1
ABC, 123, Fudge
2
532, CWE, ICECREAM
3
8023, POOP, DOGS
4
441, CHEESE, CARMEL
5
221, ABC, HOUSE

Sample Code Reading and Parsing the CSV file

Create a new file, and insert the following code into it. Make sure to read through what is going on behind the scenes.

    var csv = require('csv'); 
    // loads the csv module referenced above.

    var obj = csv(); 
    // gets the csv module to access the required functionality

    function MyCSV(Fone, Ftwo, Fthree) {
        this.FieldOne = Fone;
        this.FieldTwo = Ftwo;
        this.FieldThree = Fthree;
    }; 
    // Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.

    var MyData = []; 
    // MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP. 

    obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
        for (var index = 0; index < data.length; index++) {
            MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
        }
        console.log(MyData);
    });
    //Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function.  This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.

var http = require('http');
//Load the http module.

var server = http.createServer(function (req, resp) {
    resp.writeHead(200, { 'content-type': 'application/json' });
    resp.end(JSON.stringify(MyData));
});
// Create a webserver with a request listener callback.  This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.

server.listen(8080);
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
1
var csv = require('csv'); 
2
// loads the csv module referenced above.
3
​
4
var obj = csv(); 
5
// gets the csv module to access the required functionality
6
​
7
function MyCSV(Fone, Ftwo, Fthree) {
8
    this.FieldOne = Fone;
9
    this.FieldTwo = Ftwo;
10
    this.FieldThree = Fthree;
11
}; 
12
// Define the MyCSV object with parameterized constructor, this will be used for storing the data read from the csv into an array of MyCSV. You will need to define each field as shown above.
13
​
14
var MyData = []; 
15
// MyData array will contain the data from the CSV file and it will be sent to the clients request over HTTP. 
16
​
17
obj.from.path('../THEPATHINYOURPROJECT/TOTHE/csv_FILE_YOU_WANT_TO_LOAD.csv').to.array(function (data) {
18
    for (var index = 0; index < data.length; index++) {
19
        MyData.push(new MyCSV(data[index][0], data[index][1], data[index][2]));
20
    }
21
    console.log(MyData);
22
});
23
//Reads the CSV file from the path you specify, and the data is stored in the array we specified using callback function.  This function iterates through an array and each line from the CSV file will be pushed as a record to another array called MyData , and logs the data into the console to ensure it worked.
24
​
25
var http = require('http');
26
//Load the http module.
27
​
28
var server = http.createServer(function (req, resp) {
29
    resp.writeHead(200, { 'content-type': 'application/json' });
30
    resp.end(JSON.stringify(MyData));
31
});
32
// Create a webserver with a request listener callback.  This will write the response header with the content type as json, and end the response by sending the MyData array in JSON format.
33
​
34
server.listen(8080);
35
// Tells the webserver to listen on port 8080(obviously this may be whatever port you want.)
Things to be aware of in your app.js code
In lines 7 through 11, we define the function called 'MyCSV' and the field names.

If your CSV file has multiple columns make sure you define this correctly to match your file.

On line 17 we define the location of the CSV file of which we are loading.  Make sure you use the correct path here.

Start your App and Verify Functionality Open a console and type the following Command:

Node app 1 Node app You should see the following output in your console:

[  MYCSV { Fieldone: 'ABC', Fieldtwo: '123', Fieldthree: 'Fudge' },
   MYCSV { Fieldone: '532', Fieldtwo: 'CWE', Fieldthree: 'ICECREAM' },
   MYCSV { Fieldone: '8023', Fieldtwo: 'POOP', Fieldthree: 'DOGS' },
   MYCSV { Fieldone: '441', Fieldtwo: 'CHEESE', Fieldthree: 'CARMEL' },
   MYCSV { Fieldone: '221', Fieldtwo: 'ABC', Fieldthree: 'HOUSE' }, ]

1 [ MYCSV { Fieldone: 'ABC', Fieldtwo: '123', Fieldthree: 'Fudge' }, 2 MYCSV { Fieldone: '532', Fieldtwo: 'CWE', Fieldthree: 'ICECREAM' }, 3 MYCSV { Fieldone: '8023', Fieldtwo: 'POOP', Fieldthree: 'DOGS' }, 4 MYCSV { Fieldone: '441', Fieldtwo: 'CHEESE', Fieldthree: 'CARMEL' }, 5 MYCSV { Fieldone: '221', Fieldtwo: 'ABC', Fieldthree: 'HOUSE' }, ] Now you should open a web-browser and navigate to your server. You should see it output the data in JSON format.

Conclusion Using node.js and it's CSV module we can quickly and easily read and use data stored on the server and make it available to the client upon request

Rubin bhandari
  • 1,873
  • 15
  • 20