3

So I have a 70mb .csv file that I wanna parse and convert into a json, trying to do the json convert in a 500kb test csv I found an easy solution with regex.
The problem was when I put my actual data, I can't use fs.readFileSync anymore, so I need to work with streams.
My problem is: how can I work with streams and regex? Supose that my stream cuts the buffer in the middle of a possible regex match, I think if that happens I will lose that data. Also, the data isn't structured so I don't realise other way to parse it than regexs.
Please let me know if I wasn't clear with my explanation, english isn't my main language but also I know the english community is the biggest also the fastest and more reliable.

Thanks in advance.

Lucas Janon
  • 1,502
  • 1
  • 15
  • 18
  • Out of curiosity, is there a reason you're using node for this? From my experience Python or R is much more suited for the task. – spicypumpkin Aug 11 '17 at 02:36
  • Why would stream cut the buffer ? Read line by line like this - https://stackoverflow.com/questions/16010915/parsing-huge-logfiles-in-node-js-read-in-line-by-line – Amit Yadav Aug 11 '17 at 02:36
  • @spicypumpkin because it's a single time task and i'm more familiar with js – Lucas Janon Aug 11 '17 at 03:49

1 Answers1

8

there is a stable readline core module

and you can do this

let lineReader = require('readline').createInterface({
  input: require('fs').createReadStream('file.csv')
})

lineReader.on('line', (line) => {
  // do regexs with line
})
raksa
  • 898
  • 6
  • 17