0

I am using the following function to parse a csv file.

export default function readCsv (csv, reviver) {
  reviver = reviver || function(r, c, v) {
    return v;
  };
  let chars = csv.split(''),
    c = 0,
    cc = chars.length,
    start, end, table = [],
    row;
  while (c < cc) {
    table.push(row = []);
    while (c < cc && '\r' !== chars[c] && '\n' !== chars[c]) {
      start = end = c;
      if ('"' === chars[c]) {
        start = end = ++c;
        while (c < cc) {
          if ('"' === chars[c]) {
            if ('"' !== chars[c + 1]) {
              break;
            } else {
              chars[++c] = '';
            } // unescape ""
          }
          end = ++c;
        }
        if ('"' === chars[c]) {
          ++c;
        }
        while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) {
          ++c;
        }
      } else {
        while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) {
          end = ++c;
        }
      }
      row.push(reviver(table.length - 1, row.length, chars.slice(start, end).join('')));
      if (',' === chars[c]) {
        ++c;
      }
    }
    if ('\r' === chars[c]) {
      ++c;
    }
    if ('\n' === chars[c]) {
      ++c;
    }
  }
  return table;
}

The json looks like this:

enter image description here

What I want the json to look like is as follows:

[
    doc_id: "16278",
    framework_id: "8078",
    ...
],
[
    doc_id: "16261",
    framework_id: "880",
    ...
],

Basically, instead of getting the first row's content as the first value in the json, the first row should be converted into keys and rest of the rows into values.

AspiringCanadian
  • 1,595
  • 6
  • 24
  • 43

3 Answers3

3

It's relatively trivial to post-process your data to map it into the desired format, i.e. given the output from the CSV reader in variable data:

data = data.slice(1).map(function(row) {
    return row.reduce(function(obj, value, index) {
        var key = data[0][index];  // extract from first original row
        obj[key] = value;
        return obj;
    }, {});
});

i.e. iterate over all rows (skipping the 1st), creating an object based on the keys from the zeroth row and the values from the current.

Note that this will use the original keys in their long form. You may wish to change the values in the first row to make them more normalised first, e.g.:

data[0] = data[0].map(function(key) {
    key = key.replace(/[^\w\d\s]/g, ''); // strip non-alphanum or space
    return key.replace(/\s/g, '_').toLowerCase();
});
Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • So mapping the json from csv is not the best way to do a task like this? – AspiringCanadian Jun 01 '17 at 18:39
  • 1
    You don't have "JSON", that's an object serialisation format, just like how CSV is a data format. If you've already got CSV, then your best option (per above) is to use a bog-standard CSV reader that knows nothing about your intended structure, and then manipulate the data into the desired structure. – Alnitak Jun 01 '17 at 18:44
  • My mistake, not sure why I am referring a simple array as json. – AspiringCanadian Jun 01 '17 at 18:46
  • You were right. I ended using papaparse rather than creating my own and then post processing the data. That would leave window for bugs. – AspiringCanadian Jun 01 '17 at 21:29
0

This is an efficient approach:

function csvToKeyedArray(csv) {
  var data = [];
  var keys = csv[0];
  var datum, entry, index;

  for (index in keys) {
    keys[index] = keys[index]
      .replace(/[^\w ]+/g, '') // remove extraneous characters like '?'
      .replace(/ /g, '_') // replace spaces with _
      .toLowerCase();
  }

  for (entry = 1; entry < csv.length; entry++) {
    // initialize an object for each row and add it to the `data` array
    data[entry - 1] = datum = {};

    for (index in keys) {
      // convert rows to objects
      datum[keys[index]] = csv[entry][index];
    }
  }

  return data;
}

data is then populated to your specifications.

This will be preferable to using .map() if your CSV file contains 10000's of rows, as the overhead of context-switching due to the function call for each iteration will cause it to be noticeably slower.

However, if your CSV file is expected to only be a few hundred or thousand rows, then using .map() will likely be preferable for readability and maintainability.

Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153
  • FWIW, I would've just used `data.push(datum)` instead of `data[entry - 1] = datum` ... – Alnitak Jun 01 '17 at 19:03
  • @Alnitak in terms of efficiency, avoiding the function call is preferable. Like I said, if readability and maintainability is more important than efficiency, I deferred to your answer. Yes, it is a micro-optimization, but it does make a considerable difference with large CSV files in the order of magnitude I was describing. – Patrick Roberts Jun 01 '17 at 19:05
  • (referring to the push call specifically) this seems like (false) micro-optimisation to me... https://stackoverflow.com/questions/21034662/why-push-method-is-significantly-slower-than-putting-values-via-array-indices-in – Alnitak Jun 01 '17 at 19:05
-1

Try this one:

const csvData = `
    id,first_name,last_name,email,gender,ip_address
    1,Jami,Dumingos,jdumingos0@opensource.org,Female,209.112.103.56
    2,Brenda,Harbach,bharbach1@addtoany.com,Female,160.201.233.94
    3,Gail,Rowbrey,growbrey2@sakura.ne.jp,Female,160.199.58.40
    4,Ludvig,Coil,lcoil3@fotki.com,Male,158.37.136.163
    5,Lurlene,Conochie,lconochie4@skyrock.com,Female,145.147.12.44
    6,Aldous,Farrey,afarrey5@cargocollective.com,Male,44.148.54.88
    7,Skipp,Sket,ssket6@discovery.com,Male,81.190.215.227
    8,Greg,Wakefield,gwakefield7@yahoo.com,Male,105.157.167.96
    9,Westley,Purton,wpurton8@mapquest.com,Male,169.67.113.22
    10,Dill,Avraam,davraam9@google.de,Male,223.60.54.101`;

function readCsv(csv, splitMark = ',') {
  const lines = csv.split('\n');
  const keys = lines[0].split(splitMark); // first line as heading
  const rows = lines.slice(1, lines.length - 1);

  return rows.map(row => row
    .split(splitMark)
    .reduce((map, col, index) => {
      map[keys[index]] = col;
      return map;
    }, { }));
}

readCsv(csvData);
embarq
  • 335
  • 1
  • 9
  • Your CSV reading does not appear as resilient (or featureful) as the OP's. – Alnitak Jun 01 '17 at 18:48
  • am agreed with you @Alnitak – embarq Jun 01 '17 at 18:49
  • The csvs would have paragraphs in it so I can't use line breaks as a parameter to split the content. – AspiringCanadian Jun 01 '17 at 18:50
  • this is why in my answer I recommended using the original CSV reader intact, and then post-processing to the desired structure (using more or less exactly the same method as this). This is (IMHO) the correct "separation of concerns" approach. – Alnitak Jun 01 '17 at 18:51