2

The question is a little confusing, so I will explain with an example. I am not using any language in particular.

I have the following string:

0,1,"foo","blue,yellow,red",27

I need to create an array of these comma-delimited values, but as you can see, the forth item (index 3) is a string with quotes that also contains commas.

I need to get that string as one value, with the quotes intact like so:

[0, 1, "foo", "blue,yellow,red", 27]

Splitting on the commas wont help me, as it will also split the string item. How would I go about parsing this comma-delimited string into the list of items.

Adam Harte
  • 10,369
  • 7
  • 52
  • 85
  • 1
    You forgot to tag which programming language you are using. – Mark Jun 05 '13 at 02:09
  • Sorry, I just modified my question to say that I am not using any particular language. So answers in any language will do as long as they show the theory. – Adam Harte Jun 05 '13 at 02:10
  • possible duplicate of [regular expression should split , that are contained outside the double quotes in a CSV file?](http://stackoverflow.com/questions/1603096/regular-expression-should-split-that-are-contained-outside-the-double-quotes-i) – Jerry Coffin Jun 05 '13 at 02:13
  • 1
    I'd look at using a JSON parser. Slap `[]` on the string and then parse. (You sure it's not JSON you're parsing anyway?) – Hot Licks Jun 05 '13 at 03:15

3 Answers3

7

I don't know what language you are targeting, but the general approach is to read one character at a time, splitting at commas as usual. But if you encounter a " as the first character of a new item, you set a flag (like in_quotes). If that flag is set, you read all characters until the next ", at which time you set the flag to false.

paddy
  • 60,864
  • 6
  • 61
  • 103
2

I'll suggest using strtok using comma as the field separator. However, if the first character in the string is a double quote, you should use " as the field separator.

If I assume that what you showed is a struct, I wrote the code in C to print the output on separate lines:

#include <stdio.h>
#include <string.h>

int main()
{
    char str[] = "0,1,\"foo\",\"blue,yellow,red\",27";

    printf ( "Input string: %s\n", str );

    char * substr;
    char * str_itr = str;
    char comma[] = ",";
    char quote[] = "\"";

    substr = strtok ( str_itr, comma );
    if ( substr )
        printf ( "%s\n", substr );

    substr = strtok ( NULL, comma );
    if ( substr )
        printf ( "%s\n", substr );

    substr = strtok ( NULL, quote );
    if ( substr )
        printf ( "%s\n", substr );

    substr = strtok ( NULL, quote );
    substr = strtok ( NULL, quote );
    if ( substr )
        printf ( "%s\n", substr );

    substr = strtok ( NULL, comma );
    if ( substr )
        printf ( "%s\n", substr );

    return ( 0 );
}
unxnut
  • 8,509
  • 3
  • 27
  • 41
0

With Perl:

my $s = '0,1,"foo","blue,yellow,red",27';
my @l = grep {defined $_} split(/("[^"]*")|,/, $s);
print join("-" , @l), "\n";

Output:

0-1--"foo"---"blue,yellow,red"--27
perreal
  • 94,503
  • 21
  • 155
  • 181