1

I have written a Puppeteer script that scrapes text from a website (it's a GUI that displays the server's log files -- it's a Squarespace site, so that's the only way I can access the log files).

This is how the lines appear in the GUI: enter image description here

The script reads this (and every row below it) and outputs it to a the console.log() currently.

That record above outputs like this:

11/9/2018 at 12:21:44pm70.119.157.106AboutHostname:70.119.157.106Location:Carrollton, Texas, United StatesTags:-Referrer:www.website.com/Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36

And they get placed end-to-end in the output in a giant block like this:

11/9/2018 at 12:21:44pm70.119.157.106AboutHostname:70.119.157.106Location:Carrollton, Texas, United StatesTags:-Referrer:www.website.com/Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.3611/9/2018 at 12:21:33pm70.119.157.106HomesitesHostname:70.119.157.106Location:Carrollton, Texas, United StatesTags:-Referrer:www.website.com/about/Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.3611/9/2018 at 12:21:26pm70.119.157.106AboutHostname:70.119.157.106Location:Carrollton, Texas, United StatesTags:-Referrer:www.website.com/Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.3611/9/2018 at 12:21:15pm70.119.157.106HomeHostname:70.119.157.106Location:Carrollton, Texas, United StatesTags:-Referrer:-Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36

I am now at the step where I'm trying to figure out how to write this to CSV instead of to terminal with console.log(textContent);.

What is the best way to write this to a CSV?

(The following step will be parsing all this, but...baby steps...)

reallymemorable
  • 882
  • 1
  • 11
  • 28
  • Do you want to save the csv file locally or are you running the code from the cloud? e.x. the code is running from lambda and you need to save the csv to s3. If you need things locally, you can create a csv file and write your data to it: https://stackoverflow.com/questions/24915609/how-to-write-to-csv-file-in-javascript – Persistent Plants Nov 10 '18 at 07:00

1 Answers1

1

You could use a module like csv-stringify that converts an array of arrays into csv.

var stringify = require('csv-stringify');

var input = [] 
var line = logLine.split(",");
input.push(line);

stringify(input, function(err, output){
  fs.writeFile("output.csv", output);
});

Using a module is advisable because it handles the special scenarios where you might have characters like , or " in your csv fields.

However you could live without a module as well, CSV is a dead simple format (apart from those special cases), so if you could just join all your values by , and use fs.writeFile or fs.appendFile to write them in the output.

mihai
  • 37,072
  • 9
  • 60
  • 86
  • But the data coming in (as in the example I posted) isn't as an array. Are you suggesting that csv-stringify will detect where it needs to separate data? – reallymemorable Nov 12 '18 at 19:50
  • no, you have to do the parsing yourself and decide what goes into the array. In my code I showed a basic example of splitting by comma, but you need to decide what to keep and what to discard from the log – mihai Nov 12 '18 at 20:11