0

I am scraping data from a URL containing a csv. The format of the data I'm scraping is like this:

I am doing this in Node.js and using the nodejs-requestify package: https://www.npmjs.com/package/requestify

When I console the response.getBody() it's in the exact same format as the screenshot provided.

I am trying to convert this to a JSON array that I can iterate over in a loop to insert the values into a database, however, am struggling to get the data into JSON format.

I've tried splitting the array in multiple ways (comma, single quote, double quote). I've tried JSON.parse() and JSON.stringify() (and both in combination).

Here is the code I'm using. Ultimately when I console.log rows in the loop, this is where it should be in JSON format, however, it's just coming in as comma separated values still.

requestify.get('URL').then(function(response) {
  // Get the response body
  var dataBody = response.getBody();
  var lineArray = dataBody.split('\r\n');
  var data = JSON.parse(JSON.stringify(lineArray));

  for(var s = 0; s < data.length; s++) {
    var rows = data[s];
    console.log(rows)
  }
});
doriansm
  • 247
  • 1
  • 5
  • 31
  • In general it's not safe to assume that splitting on line breaks will give you the rows of a CSV. With quoted fields a line break can appear within a field. I strongly recommend using an existing CSV module like [csv-parse](https://www.npmjs.com/package/csv-parse) or [csv-parser](https://www.npmjs.com/package/csv-parser) which will handle the edge cases for you. – Jordan Running May 24 '19 at 16:16
  • 1
    Possible duplicate of [How to convert CSV to JSON in Node.js](https://stackoverflow.com/questions/16831250/how-to-convert-csv-to-json-in-node-js) – Ahmed Yousif May 24 '19 at 16:18
  • OK I will look into csv-parse – doriansm May 24 '19 at 16:21
  • So the lineArray gives you the value you have shown in the image ? One more thing is you are doing JSON.stringify the JSON.parse which is not correct. – Saroj May 24 '19 at 17:24
  • I solved this. Posted as an answer. – doriansm May 24 '19 at 20:39

2 Answers2

0

There is a basic misundertanding I think

var lineArray = dataBody.split('\r\n');

lineArray now contains something like

"a", "b", "c"

but for doing something like

var data = JSON.parse(lineArray);

you need lineArray to be

{ "a":"1", "b":"2", "c":"3" }

I think you need something like

const lineData = lineArray.split(',');
const keys = ["name", "age", "gender"];
const jsonLineData = {};
keys.forEach((key, index) => {
    jsonLineData[key] = lineData(index);
});
Saroj
  • 1,551
  • 2
  • 14
  • 30
Andrea Franchini
  • 548
  • 4
  • 14
0

I solved this by using csvtojson and aws-dsk since my csv is hosted on S3.

async function startAWS(db){
  //Retrieve AWS IAM credentials for the 'master' user
  var awsCredentials;
  try{ 
    awsCredentials = await retrievePromise(config.get('aws')); 
  }
  catch (e) { 
    console.log({error:e},'startAWS error'); 
  }
  //Setup the AWS config to access our S3 bucket
  AWS.config = new AWS.Config({
    accessKeyId :  awsCredentials.principal,
    secretAccessKey :awsCredentials.credential,
    region:'us-east-1'
  });
  //Call S3 and specify bucket and file name
  const S3 = new AWS.S3();
  const params = {
    Bucket: '***',
    Key: '***' //filename
  };
  //Convert csv file to JSON
  async function csvToJSON() {
  // get csv file and create stream
  const stream = S3.getObject(params).createReadStream();
  // convert csv file (stream) to JSON format data
  const json = await csv().fromStream(stream);

  //connect to DB and continue script
  db.getConnection()
  .then(async (conn) => {
    if(json.length) {
      for(var s = 0; s < json.length; s++) {
        var rows = json[s];

        const insert = await conn.query(
          'SQL HERE'
        );
      }
    }
  })
  };
  csvToJSON();
}


doriansm
  • 247
  • 1
  • 5
  • 31