1

I have a CSV document with the following format:

NOT_HEADER1|NOT_HEADER2|NOT_HEADER3...
HEADER1|HEADER2|HEADER3|HEADER4|HEADER5|HEADER6
VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6

As the first line does not contain the actual headers, I can parse the file just fine by skipping the first line which doesn't contain the headers like:

import csv from 'csv-parser';
import fs from 'fs';
const results: any = [];

fs.createReadStream('pathToFile')
    .pipe(csv({ separator: '|', skipLines: 1 }))
    .on('headers', (headers) => {
      console.log(headers);
    })
    .on('data', (data) => results.push(data))
    .on('end', () => {
      console.log(results);
    });

The bad part is that I also need the first row stored somewhere. I can parse it myself, but I don't know how to tap into the stream, extract and save it, and let the csv-parser pipe take care of the rest.

iDaniel19
  • 75
  • 7

1 Answers1

1

You can use readline.createInterface to process the input file line by line. Store the first line and write all others into your csv stream:

var firstLine;
var csvStream = csv({...})
.on("headers", ...)
.on("data", ...)
.on("end", () => {
  console.log(firstLine, results);
});
readline.createInterface({
  input: fs.createReadStream("..."),
  crlfDelay: Infinity
})
.on("line", function(line) {
  if (firstLine) {
    csvStream.write(line);
    csvStream.write("\n");
  } else
    firstLine = line;
})
.on("close", function() {
  csvStream.end();
});
Heiko Theißen
  • 12,807
  • 2
  • 7
  • 31
  • I have no idea how I didn't know of 'readline' until now. Thank you. The main idea here is a stream writing into another stream, right? Do you think that would have a significant performance hit? There would be another approach to read the entire file in memory with `fs.readFile` and process it. – iDaniel19 Mar 10 '23 at 21:58
  • 1
    I expect no performance hit. But reading the entire file before starting any further processing can hurt performance. – Heiko Theißen Mar 10 '23 at 22:01