I'm scraping a webpage with lots of data on it, formatted as an HTML table. You have to submit a form to generate the table. My node script submits all the permutations of the form, and each time scrapes the resulting table, turning each row into a line of data.
The problem is, when I write the data to some file, it stops working when the file gets to be about 10MB in size. Sometimes it's a little less; sometimes a little more. I have tried writing the file as .csv, .json, and .txt, and each time the same problem occurs.
I am using fs
to perform this task. The relevant code is:
var fs = require("fs");
var stream = fs.createWriteStream("data.csv"); // can also be .json or .txt
stream.write(line_of_data);
I can console.log(line_of_data)
and it works fine, all the way through until there's no data left to scrape. But at about 10MB, the output file won't accept any more lines of data. The stopping point seems almost completely arbitrary -- every time I run the script, it stops writing at a different point. I have plently of storage space on my hard drive, so the problem must have to do with something else.