4

Is there a decent CSV Parser library for JavaScript? I've used this and that solution so far. In the first solution a new line is never created as a new sub-array, also the code tells so and the second solution does not work on text files formatted in Windows with <CR><LF> , respectively \r\n

Is it sufficient to apply

text = text.replace("\r","");

to the Windows CSV files? This actually works, but I think this is a little bit quirks. Are there csv parser which are more common than a random bloggers solution?

Community
  • 1
  • 1
Konrad Reiche
  • 27,743
  • 15
  • 106
  • 143

4 Answers4

4

Here's the 'easy' solution

csv.split(/\r\n|\r|\n/g)

It handles:

  • \n
  • \r
  • \r\n
  • \n\r

Unfortunately, it breaks on values that contain newline chars between delimiters.

For example, the following line entry...

"this is some","valid CSV data","with a \r\nnewline char"

Will break it because the '\r\n' will be mistakenly interpreted as the end of an entry.

For a complete solution, your best bet is to create a ND-FSM (Non-Deterministic Finite State Machine) lexer/parser. If you have ever heard of the Chomsky Hierarchy, CSV can be parsed as a Type III grammar. That means char-by-char or token-by-token processing with state tracking.

I have a fully RFC 4180 compliant client-side library available but somehow I attracted the attention of a delete-happy mod for external linking. There's a link in my profile if you're interested; otherwise, good luck.

I'll give you fair warning from experience, CSV looks deceptively easy on the surface. After studying tens/hundreds of implementations, I have only seen 3 javascript parsers that did a reasonable job of meeting the spec and none of them were completely RFC compliant. I managed to write one but only with the help of the community and lots and lots of pain.

Community
  • 1
  • 1
Evan Plaice
  • 13,944
  • 6
  • 76
  • 94
2

If you're working in Node, there's an excellent CSV parser that can handle extremely large amounts of data (>GB files) and supports escape characters.

If you're working in browser JS, you could still extract the processing logic from the code so that it operates on a string (instead of a Node Stream).

josh3736
  • 139,160
  • 33
  • 216
  • 263
2

Here is one way to do it:

// based on json_parse from JavaScript The Good Part by D. Crockford
var csv_parse = function () {
    var at,
        ch,
        text,
        error = function (m) {
            throw {
                name: 'SyntaxError',
                message: m,
                at: at,
                text: text  
            };
        },
        next = function (c) {
            if (c && c !== ch) {
                error("Expected '" + c + "' instead of '" + ch + "'");
            }

            ch = text.charAt(at);
            at += 1;
            return ch;
        },
        //needed to handle "" which indicates escaped quote
        peek = function () {
            return text.charAt(at); 
        },
        white = function () {
            while (ch && ch <= ' ' && ch !== '\n') {
                next();
            }
        },
        // if numeric, then return number
        number = function () {
            var number,
                string = word();

            number = +string;
            if (isNaN(number)) {
                return string;
            } else {
                return number;
            }
        },
        word = function () {
            var string = '';
            while (ch !== ',' && ch !== '\n') {
                string += ch;
                next();
            }
            return string;
        },
        // the matching " is the end of word not ,
        // need to worry about "", which is escaped quote
        quoted = function () {
            var string ='';

            if (ch === '"') {
                while (next()) {
                    if (ch === '"') {
                        //print('need to know ending quote or escaped quote');
                        // need to know ending quote or escaped quote ("")
                        if (peek() === '"') {
                            //print('maybe double quote near '+string);
                            next('"');
                            string += ch;
                        } else {
                            next('"')
                            return string;
                        }
                    } else {
                        string += ch;
                    }
                }
                return string;
            }
            error("Bad string");
        },
        value = function () {
            white();

            switch(ch) {
            case '-':
                return number();
            case '"':
                return quoted();
            default:
                return ch >= '0' && ch <= '9' ? number() : word();  
            }

            return number();
        },
        line = function () {
            var array = [];
            white();
            if (ch === '\n') {
                next('\n');
                return array;//empty []
            }
            while (ch) {
                array.push( value() );
                white();
                if (ch === '\n') {
                    next('\n');
                    return array;//got something
                }
                next(',');// not very liberal with delimiter
                white();
            }
        };


  return function (_line) {
    var result;
    text = _line;
    at = 0;
    ch = ' ';
    result = line();
    white();
    if (ch) {
        error("Syntax error");
    }
    return result;
  };
}();
jGc
  • 29
  • 2
0

My function is solid, just drop in and use, I hope it is of help to you.

csvToArray v1.3

A compact (508 bytes) but compliant function to convert a CSV string into a 2D array, conforming to the RFC4180 standard.

http://code.google.com/p/csv-to-array/

Common Usage: jQuery

 $.ajax({
        url: "test.csv",
        dataType: 'text',
        cache: false
 }).done(function(csvAsString){
        csvAsArray=csvAsString.csvToArray();
 });

Common usage: Javascript

csvAsArray = csvAsString.csvToArray();

Override field separator

csvAsArray = csvAsString.csvToArray("|");

Override record separator

csvAsArray = csvAsString.csvToArray("", "#");

Override Skip Header

csvAsArray = csvAsString.csvToArray("", "", 1);

Override all

csvAsArray = csvAsString.csvToArray("|", "#", 1);
dt192
  • 1,003
  • 7
  • 12