87

I'm looking for [a, b, c, "d, e, f", g, h]to turn into an array of 6 elements: a, b, c, "d,e,f", g, h. I'm trying to do this through Javascript. This is what I have so far:

str = str.split(/,+|"[^"]+"/g); 

But right now it's splitting out everything that's in the double-quotes, which is incorrect.

Edit: Okay sorry I worded this question really poorly. I'm being given a string not an array.

var str = 'a, b, c, "d, e, f", g, h';

And I want to turn that into an array using something like the "split" function.

Dale K
  • 25,246
  • 15
  • 42
  • 71
jpecht
  • 993
  • 1
  • 7
  • 6
  • 3
    Regex isn't really the best tool for this, since regular expressions don't save state. – Amber Jul 12 '12 at 17:05
  • @Amber: Then what is the best tool? – gen_Eric Jul 12 '12 at 17:08
  • 1
    String manipulations, of course! I'm cooking up an answer now... – Elliot Bonneville Jul 12 '12 at 17:08
  • Possible duplicate of [How can I parse a CSV string with Javascript, which contains comma in data?](https://stackoverflow.com/questions/8493195/how-can-i-parse-a-csv-string-with-javascript-which-contains-comma-in-data) – LWC Sep 21 '17 at 08:35
  • If for some reason the answers here don't work with your specific use case, as was the case for me, you can try the answer in this duplicate : https://stackoverflow.com/questions/23582276/split-string-by-comma-but-ignore-commas-inside-quotes/23582323 – Mathieu de Lorimier Apr 30 '19 at 14:29
  • If this solution doesn't work, I recommend this other solution: https://stackoverflow.com/questions/57576681/how-can-i-split-by-commas-while-ignoring-any-comma-thats-inside-quotes/57576855#57576855 – Matt123 Aug 20 '19 at 16:31

18 Answers18

109

Here's what I would do.

var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);

enter image description here /* will match:

    (
        ".*?"       double quotes + anything but double quotes + double quotes
        |           OR
        [^",\s]+    1 or more characters excl. double quotes, comma or spaces of any kind
    )
    (?=             FOLLOWED BY
        \s*,        0 or more empty spaces and a comma
        |           OR
        \s*$        0 or more empty spaces and nothing else (end of string)
    )
    
*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);
DecPK
  • 24,537
  • 6
  • 26
  • 42
inhan
  • 7,394
  • 2
  • 24
  • 35
  • 3
    Awesome regexp mate. But isn't `/".*"|[^,"\s]+/` enough? –  Apr 21 '14 at 11:22
  • 14
    This won't work on a string like this one: `'Hello World, b, c, "d, e, f", c'`. It returns `["World","b","c","d, e, f", "c"]` – m.spyratos Apr 21 '15 at 04:12
  • 2
    Good but splits on spaces between words, I amended it to `/(".*?"|[^\s",][^",]+[^\s",])(?=\s*,|\s*$)/` – Joel Mitchell Jun 27 '15 at 17:57
  • 17
    to make it work with spaces in between, use the updated form : `(".*?"|[^",]+)(?=\s*,|\s*$)` , see [this](https://regex101.com/r/lC7iK5/1) – arkoak Dec 09 '15 at 08:35
  • 6
    Does not work when the first column has no data (Export from excel) `,col2_val,col3_val` – Andrew Dec 09 '15 at 20:27
  • 2
    doesn't work for strings like 'a, b, c, blah "d,e,f" blah, g, h' . If something precedes or succeeds quoted content the output arr doesn't seems to have it – vatsa Jan 21 '16 at 21:25
  • 2
    There are extra " for the part within " ". For example, the string 'a, b, c, "d, e, f", g, h' become array ["a", "b", "c", ""d, e, f"", "g", "h"]. – zhihong Apr 20 '17 at 13:23
  • 1
    The above is great but it returns "d,e,f" in quotes. Rather than tearing my hair out trying to fix the regex, I did this - var regexp = /(".*?"|[^",\s]+)(?=\s*,|\s*$)/g; var arr = []; var res; while ((res = regexp.exec(str)) !== null) { arr.push(res[0].replace(/(?:^")|(?:"$)/g, '')); } return arr; – martinp999 Jun 01 '17 at 23:05
  • 1
    arkoak's solution does not work with empty values, like ,a,b,c,d (which is common in csv with blank values) https://regex101.com/r/lC7iK5/37 ,a,b,c,d should be an array of 5 values, not 4 – Ted Scheckler Jun 20 '19 at 13:11
  • Fixed the issues in my answer: https://stackoverflow.com/a/57121244/2771889 @m.spyratos zhihong martinp99 Andrew Ted Scheckler – thisismydesign Jul 20 '19 at 02:04
  • This also happens to work well for separating a search string by "terms" and "phrases", like `"good fruits" apple banana` -> `['"good fruits"', 'apple', 'banana']`, with a simple modification: `/(".*?"|[^"\s]+)(?=\s*|\s*$)/g` – V. Rubinetti Sep 18 '20 at 16:20
  • 1
    Won't work for "Hello",World,"","test" where there is empty string in double quotes – lifetimeLearner007 Mar 03 '21 at 15:09
  • This doesn't work for strings like `"\"Bananas with chocolate\", Sunday Funday, Tuesday Wednesday"` It would return ["\"Bananas with chocolate\"", "Sunday", "Funday", "Tuesday", "Wednesday"] – cmcnphp Mar 21 '22 at 05:44
  • 2
    Almost works. But this doesn't match empty values when you have consecutive commas `,,,,,` – stallingOne Mar 24 '22 at 13:09
  • @vatsa for this case try [`str.match(/(?:[^",]+|"[^"]*")+/g).map(e => e.trim())`](https://tio.run/##HYzBDsIgEETv/YoNl4Jd6dXYVD/E1ARxhRpaGiA9@e@4epnJZN7M2@wm2zRv5bifag1UIJcEI7QG4YFgWYPxIJ5I@BL/gOAQfNs0PzpRZpo3ejHFetnL6/l2Fzh1H8E@HYTqeqe43STBeAHSJc2LVGpobFxzDKRDdJJ/1FDrFw) @stallingOne to further get empty matches, extend the regex to [`(?:[^",]+|"[^"]*")+|^(?=,)|(?<=,)`](https://regex101.com/r/4lRfKJ/1) (this contains a look*behind* and requires modern JS). – bobble bubble Apr 16 '23 at 10:34
  • Does not work for `'a, b, , c, "d, e, f", g, h "i",'`. @f-society [answer](https://stackoverflow.com/a/53774647/713573) is working for this case also. Just another round of trim for each is required. – Gagan Apr 28 '23 at 05:18
40

regex: /,(?=(?:(?:[^"]*"){2})*[^"]*$)/

enter image description here

const input_line = '"2C95699FFC68","201 S BOULEVARDRICHMOND, VA 23220","8299600062754882","2018-09-23"'

let my_split = input_line.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/)[4]

Output: 
my_split[0]: "2C95699FFC68", 
my_split[1]: "201 S BOULEVARDRICHMOND, VA 23220", 
my_split[2]: "8299600062754882", 
my_split[3]: "2018-09-23"

Reference following link for an explanation: regexr.com/44u6o

DecPK
  • 24,537
  • 6
  • 26
  • 42
f-society
  • 2,898
  • 27
  • 18
  • 11
    This worked really nicely for me, but how would it be changed to not include the outside quotes in the result? – ScottFoster1000 Aug 25 '19 at 03:44
  • Modified this a bit to now accept values such as `'http://url-to-something.test/1', "Open, this message?", 'Open you sure you want to open this spam message?'` Sample here: https://regex101.com/r/rzacJ7/1 – Satch Oct 27 '22 at 04:08
11

Here is a JavaScript function to do it:

function splitCSVButIgnoreCommasInDoublequotes(str) {  
    //split the str first  
    //then merge the elments between two double quotes  
    var delimiter = ',';  
    var quotes = '"';  
    var elements = str.split(delimiter);  
    var newElements = [];  
    for (var i = 0; i < elements.length; ++i) {  
        if (elements[i].indexOf(quotes) >= 0) {//the left double quotes is found  
            var indexOfRightQuotes = -1;  
            var tmp = elements[i];  
            //find the right double quotes  
            for (var j = i + 1; j < elements.length; ++j) {  
                if (elements[j].indexOf(quotes) >= 0) {  
                    indexOfRightQuotes = j; 
                    break;
                }  
            }  
            //found the right double quotes  
            //merge all the elements between double quotes  
            if (-1 != indexOfRightQuotes) {   
                for (var j = i + 1; j <= indexOfRightQuotes; ++j) {  
                    tmp = tmp + delimiter + elements[j];  
                }  
                newElements.push(tmp);  
                i = indexOfRightQuotes;  
            }  
            else { //right double quotes is not found  
                newElements.push(elements[i]);  
            }  
        }  
        else {//no left double quotes is found  
            newElements.push(elements[i]);  
        }  
    }  

    return newElements;  
}  
Nano S.
  • 3
  • 2
shifu.zheng
  • 691
  • 7
  • 16
9

This works well for me. (I used semicolons so the alert message would show the difference between commas added when turning the array into a string and the actual captured values.)

REGEX

/("[^"]*")|[^;]+/

enter image description here

var str = 'a; b; c; "d; e; f"; g; h; "i"';
var array = str.match(/("[^"]*")|[^;]+/g); 
alert(array);
DecPK
  • 24,537
  • 6
  • 26
  • 42
John Fisher
  • 22,355
  • 2
  • 39
  • 64
9

Here's a non-regex one that assumes doublequotes will come in pairs:

function splitCsv(str) {
  return str.split(',').reduce((accum,curr)=>{
    if(accum.isConcatting) {
      accum.soFar[accum.soFar.length-1] += ','+curr
    } else {
      accum.soFar.push(curr)
    }
    if(curr.split('"').length % 2 == 0) {
      accum.isConcatting= !accum.isConcatting
    }
    return accum;
  },{soFar:[],isConcatting:false}).soFar
}

console.log(splitCsv('asdf,"a,d",fdsa'),' should be ',['asdf','"a,d"','fdsa'])
console.log(splitCsv(',asdf,,fds,'),' should be ',['','asdf','','fds',''])
console.log(splitCsv('asdf,"a,,,d",fdsa'),' should be ',['asdf','"a,,,d"','fdsa'])
Andrew Ulrich
  • 133
  • 1
  • 5
7

Here's the regex we're using to extract valid arguments from a comma-separated argument list, supporting double-quoted arguments. It works for the outlined edge cases. E.g.

  • doesn't include quotes in the matches
  • works with white spaces in matches
  • works with empty fields

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))|(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

Proof: https://regex101.com/r/UL8kyy/3/tests (Note: currently only works in Chrome because the regex uses lookbehinds which are only supported in ECMA2018)

According to our guidelines it avoids non-capturing groups and greedy matching.

I'm sure it can be simplified, I'm open to suggestions / additional test cases.

For anyone interested, the first part matches double-quoted, comma-delimited arguments:

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))

And the second part matches comma-delimited arguments by themselves:

(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

thisismydesign
  • 21,553
  • 9
  • 123
  • 126
  • I can't get it to work with empty fields (,,, or ,"","",) so I had to do this first: row = row.split(',').map(p => (p && p || '"_"')).join(','); – Kristian MT May 12 '20 at 14:44
  • 1
    You can change the first part to `(?<=")[^"]*?(?="(?:\s*?,|\s*?$))` to match empty params. E.g. `"foo", "", "bar"` will have 3 matches. – thisismydesign May 12 '20 at 15:15
  • @thisismydesign Is it possible to modify this to accept empty values within a CSV file as well? For example, test,,hello,goodbye should be have 4 matches. – Colin Null Sep 30 '20 at 19:18
  • @Colin Null I'm sure it can be done but I don't recommend using this to parse CSV. You'd have to think a lot about edge cases, such as escaping the delimiter. Use a library instead. – thisismydesign Oct 01 '20 at 10:18
6

I almost liked the accepted answer, but it didn't parse the space correctly, and/or it left the double quotes untrimmed, so here is my function:

    /**
     * Splits the given string into components, and returns the components array.
     * Each component must be separated by a comma.
     * If the component contains one or more comma(s), it must be wrapped with double quotes.
     * The double quote must not be used inside components (replace it with a special string like __double__quotes__ for instance, then transform it again into double quotes later...).
     *
     * https://stackoverflow.com/questions/11456850/split-a-string-by-commas-but-ignore-commas-within-double-quotes-using-javascript
     */
    function splitComponentsByComma(str){
        var ret = [];
        var arr = str.match(/(".*?"|[^",]+)(?=\s*,|\s*$)/g);
        for (let i in arr) {
            let element = arr[i];
            if ('"' === element[0]) {
                element = element.substr(1, element.length - 2);
            } else {
                element = arr[i].trim();
            }
            ret.push(element);
        }
        return ret;
    }
    console.log(splitComponentsByComma('Hello World, b, c, "d, e, f", c')); // [ 'Hello World', 'b', 'c', 'd, e, f', 'c' ]
ling
  • 9,545
  • 4
  • 52
  • 49
  • 1
    The only problem with this answer (and I came to find it after a long time after I copied it lol) is that it ignores empty entries such as "test1,,test2". Having nothing in between the commas makes your regex skip it. I ended up using @f-society answer at the end. – Raphael Setin Oct 18 '22 at 18:24
  • Hi, can you please point me to @f-society's answer. i am looking to split this 'o,"sadasdasd",123123123,"asdasdasd.www.org,123123,link.com",0,,123' into 7 fields. – Dinakar Ullas Mar 06 '23 at 23:32
3

Parse any CSV or CSV-String code based on TYPESCRIPT

public parseCSV(content:string):any[string]{
        return content.split("\n").map(ar=>ar.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/).map(refi=>refi.replace(/[\x00-\x08\x0E-\x1F\x7F-\uFFFF]/g, "").trim()));
    }

var str='"abc",jkl,1000,qwerty6000';

parseCSV(str);

output :

[
"abc","jkl","1000","qwerty6000"
]
Shantanu Sharma
  • 666
  • 6
  • 21
1

I know it's a bit long, but here's my take:

var sample="[a, b, c, \"d, e, f\", g, h]";

var inQuotes = false, items = [], currentItem = '';

for(var i = 0; i < sample.length; i++) {
  if (sample[i] == '"') { 
    inQuotes = !inQuotes; 

    if (!inQuotes) {
      if (currentItem.length) items.push(currentItem);
      currentItem = '';
    }

    continue; 
  }

  if ((/^[\"\[\]\,\s]$/gi).test(sample[i]) && !inQuotes) {
    if (currentItem.length) items.push(currentItem);
    currentItem = '';
    continue;
  }

  currentItem += sample[i];
}

if (currentItem.length) items.push(currentItem);

console.log(items);

As a side note, it will work both with, and without the braces in the start and end.

Ioannis Karadimas
  • 7,746
  • 3
  • 35
  • 45
1

This takes a csv file one line at a time and spits back an array with commas inside speech marks intact. if there are no speech marks detected it just .split(",")s as normal... could probs replace that second loop with something but it does the job as is

function parseCSVLine(str){
    if(str.indexOf("\"")>-1){
        var aInputSplit = str.split(",");
        var aOutput = [];
        var iMatch = 0;
        //var adding = 0;
        for(var i=0;i<aInputSplit.length;i++){
            if(aInputSplit[i].indexOf("\"")>-1){
                var sWithCommas = aInputSplit[i];
                for(var z=i;z<aInputSplit.length;z++){
                    if(z !== i && aInputSplit[z].indexOf("\"") === -1){
                        sWithCommas+= ","+aInputSplit[z];
                    }else if(z !== i && aInputSplit[z].indexOf("\"") > -1){
                        sWithCommas+= ","+aInputSplit[z];
                        sWithCommas.replace(new RegExp("\"", 'g'), "");
                        aOutput.push(sWithCommas);
                        i=z;
                        z=aInputSplit.length+1;
                        iMatch++;
                    }
                    if(z === aInputSplit.length-1){
                        if(iMatch === 0){
                            aOutput.push(aInputSplit[z]);
                        }                  
                        iMatch = 0;
                    }
                }
            }else{
                aOutput.push(aInputSplit[i]);
            }
        }
        return aOutput
    }else{
        return str.split(",")
    }
}
JamesHennigan
  • 100
  • 1
  • 7
1

Use the npm library csv-string to parse the strings instead of split: https://www.npmjs.com/package/csv-string

This will handle the empty entries

0

Something like a stack should do the trick. Here I vaguely use marker boolean as stack (just getting my purpose served with it).

var str = "a,b,c,blah\"d,=,f\"blah,\"g,h,";
var getAttributes = function(str){
  var result = [];
  var strBuf = '';
  var start = 0 ;
  var marker = false;
  for (var i = 0; i< str.length; i++){

    if (str[i] === '"'){
      marker = !marker;
    }
    if (str[i] === ',' && !marker){
      result.push(str.substr(start, i - start));
      start = i+1;
    }
  }
  if (start <= str.length){
    result.push(str.substr(start, i - start));
  }
  return result;
};

console.log(getAttributes(str));
vatsa
  • 124
  • 1
  • 9
0

jsfiddle setting image code output image

The code works if your input string in the format of stringTocompare. Run the code on https://jsfiddle.net/ to see output for fiddlejs setting. Please refer to the screenshot. You can either use split function for the same for the code below it and tweak the code according to you need. Remove the bold or word with in ** from the code if you dont want to have comma after split attach=attach**+","**+actualString[t+1].

var stringTocompare='"Manufacturer","12345","6001","00",,"Calfe,eto,lin","Calfe,edin","4","20","10","07/01/2018","01/01/2006",,,,,,,,"03/31/2004"';

console.log(stringTocompare);

var actualString=stringTocompare.split(',');
console.log("Before");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}
//var actualString=stringTocompare.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);
for(var i=0;i<actualString.length;i++){
var flag=0;
var x=actualString[i];
if(x!==null)
{
if(x[0]=='"' && x[x.length-1]!=='"'){
   var p=0;
   var t=i;
   var b=i;
   for(var k=i;k<actualString.length;k++){
   var y=actualString[k];
        if(y[y.length-1]!=='"'){        
        p++;
        }
        if(y[y.length-1]=='"'){

                flag=1;
        }
        if(flag==1)
        break;
   }
   var attach=actualString[t];
for(var s=p;s>0;s--){

  attach=attach+","+actualString[t+1];
  t++;
}
actualString[i]=attach;
actualString.splice(b+1,p);
}
}


}
console.log("After");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}




  [1]: https://i.stack.imgur.com/3FcxM.png
  • 1
    you can add a code snippet directly to your answer see also https://meta.stackexchange.com/questions/22186/how-do-i-format-my-code-blocks – tung Apr 30 '18 at 09:57
0

I solved this with a simple parser.

It simply goes through the string char by char, splitting off a segment when it finds the split_char (e.g. comma), but also has an on/off flag which is switched by finding the encapsulator_char (e.g. quote). It doesn't require the encapsulator to be at the start of the field/segment (a,b","c,d would produce 3 segments, with 'b","c' as the second), but it should work for a well formed CSV with escaped encapsulator chars.

function split_except_within(text, split_char, encapsulator_char, escape_char) {
    var start = 0
    var encapsulated = false
    var fields = []
    for (var c = 0; c < text.length; c++) {
        var char = text[c]
        if (char === split_char && ! encapsulated) {
            fields.push(text.substring(start, c))
            start = c+1
        }
        if (char === encapsulator_char && (c === 0 || text[c-1] !== escape_char) )             
            encapsulated = ! encapsulated
    }
    fields.push(text.substring(start))
    return fields
}

https://jsfiddle.net/7hty8Lvr/1/

0
const csvSplit = (line) => {
    let splitLine = [];

    var quotesplit = line.split('"');
    var lastindex = quotesplit.length - 1;
    // split evens removing outside quotes, push odds
    quotesplit.forEach((val, index) => {
        if (index % 2 === 0) {
            var firstchar = (index == 0) ? 0 : 1;
            var trimmed = (index == lastindex) 
                ? val.substring(firstchar)
                : val.slice(firstchar, -1);
            trimmed.split(",").forEach(v => splitLine.push(v));
        } else {
            splitLine.push(val);
        }
    });
    return splitLine;
}

this works as long as quotes always come on the outside of values that contain the commas that need to be excluded (i.e. a csv file).

if you have stuff like '1,2,4"2,6",8' it will not work.

a exum
  • 1
0

This code supports single and double quotes, comma separator inside single and double quotes, also empty inside commas.

txtArguments = "   1,2,'asd,123', pepe, \"A,B\",,   ";
let acumParam = new Array();
if(txtArguments.trim().length > 0)
{
    let inSQuotes = false;
    let inDQuotes = false;
    let tmpParam = "";                      
    for(let i=0; i<txtArguments.length; ++i)
    {
        const char = txtArguments.substring(i, i+1);
        if(char == "'")
            inSQuotes = inSQuotes ? false : !inDQuotes;
        else if(char == '"')
            inDQuotes = inDQuotes ? false : !inSQuotes;
        if(char == ",")
        {
            if(inSQuotes)
                tmpParam += char;
            else if(inDQuotes)
                tmpParam += char;
            else
            {
                acumParam.push(tmpParam);
                tmpParam = "";
            }
        }
        else
            tmpParam += char;
    }
    acumParam.push(tmpParam);
}
Cheva
  • 331
  • 5
  • 12
-1

Assuming your string really looks like '[a, b, c, "d, e, f", g, h]', I believe this would be 'an acceptable use case for eval():

myString = 'var myArr ' + myString;
eval(myString);

console.log(myArr); // will now be an array of elements: a, b, c, "d, e, f", g, h

Edit: As Rocket pointed out, strict mode removes eval's ability to inject variables into the local scope, meaning you'd want to do this:

var myArr = eval(myString);
Elliot Bonneville
  • 51,872
  • 23
  • 96
  • 123
-1

I've had similar issues with this, and I've found no good .net solution so went DIY. NOTE: This was also used to reply to

Splitting comma separated string, ignore commas in quotes, but allow strings with one double quotation

but seems more applicable here (but useful over there)

In my application I'm parsing a csv so my split credential is ",". this method I suppose only works for where you have a single char split argument.

So, I've written a function that ignores commas within double quotes. it does it by converting the input string into a character array and parsing char by char

public static string[] Splitter_IgnoreQuotes(string stringToSplit)
    {   
        char[] CharsOfData = stringToSplit.ToCharArray();
        //enter your expected array size here or alloc.
        string[] dataArray = new string[37];
        int arrayIndex = 0;
        bool DoubleQuotesJustSeen = false;          
        foreach (char theChar in CharsOfData)
        {
            //did we just see double quotes, and no command? dont split then. you could make ',' a variable for your split parameters I'm working with a csv.
            if ((theChar != ',' || DoubleQuotesJustSeen) && theChar != '"')
            {
                dataArray[arrayIndex] = dataArray[arrayIndex] + theChar;
            }
            else if (theChar == '"')
            {
                if (DoubleQuotesJustSeen)
                {
                    DoubleQuotesJustSeen = false;
                }
                else
                {
                    DoubleQuotesJustSeen = true;
                }
            }
            else if (theChar == ',' && !DoubleQuotesJustSeen)
            {
                arrayIndex++;
            }
        }
        return dataArray;
    }

This function, to my application taste also ignores ("") in any input as these are unneeded and present in my input.

Community
  • 1
  • 1