0

I have a line:

$string = 'Paul,12,"soccer,baseball,hockey",white';

I am try to split this into @array that has 4 values so

print $array[2];

Gives

soccer,baseball,hockey

How do I this? Help!

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
masterial
  • 2,166
  • 9
  • 33
  • 48

7 Answers7

11

Just use Text::CSV. As you can see from the source, getting CSV parsing right is quite complicated:

sub _make_regexp_split_column {
    my ($esc, $quot, $sep) = @_;

    if ( $quot eq '' ) {
        return qr/([^\Q$sep\E]*)\Q$sep\E/s;
    }

   qr/(
        \Q$quot\E
            [^\Q$quot$esc\E]*(?:\Q$esc\E[\Q$quot$esc\E0][^\Q$quot$esc\E]*)*
        \Q$quot\E
        | # or
        [^\Q$sep\E]*
       )
       \Q$sep\E
    /xs;
}
singingfish
  • 3,136
  • 22
  • 25
  • 2
    There are ways to install modules without install permissions. See http://stackoverflow.com/questions/251705/how-can-i-use-a-new-perl-module-without-install-permissions. Also if the module is pure-perl, you can simply put it in the same directory as the script, or put it in some directory you have access to and use the `use lib "directory"` contruct, see perldoc.perl.org/lib.html . – Joel Berger Feb 13 '11 at 05:23
  • the pure perl implementation is here http://search.cpan.org/perldoc?Text::CSV_PP . You can even click on the "source" link and copy/paste it into a file (being careful about naming). That SHOULD work. – Joel Berger Feb 13 '11 at 05:28
  • How would I do it using TEXT::CSV? – masterial Feb 13 '11 at 06:56
  • @seaworthy, added an answer http://stackoverflow.com/questions/4982542/how-do-i-split-a-string-into-an-array-by-comma-but-ignore-commas-inside-double-qu/4984786#4984786 – Joel Berger Feb 13 '11 at 17:30
7

The standard module Text::ParseWords will do this as well.

my @array = parse_line(q{,}, 0, $string);
oylenshpeegul
  • 3,404
  • 1
  • 18
  • 18
  • Interesting, I didn't know this existed. – Joel Berger Feb 13 '11 at 17:38
  • 1
    Text::ParseWords tries to replicate Bourne shell quoting rules, which are not the same as the rules used by most CSV parsers/emitters. In particular, both single and double quotes are significant, and it uses backslash as an escape character. So while it works for this specific example, it's a poor choice for CSV parsing. – cjm Feb 14 '11 at 03:56
  • The hivemind has inferred that this is a question about CSV parsing. The OP may well be using Bourne shell quoting rules, one of the many extant CSV dialects or a scheme of their own making. Who can say? I was merely trying to answer the question that was actually asked. – oylenshpeegul Feb 14 '11 at 14:24
  • You should have at least mentioned that it also ignores commas inside single quotes, as that's not something the OP asked for, and could come as a big surprise if the data includes apostrophes. (Especially since Text::ParseWords returns an empty list for lines with unbalanced single quotes.) – cjm Feb 14 '11 at 23:00
  • I happened to have come back to this from time-to-time, I thought I might comment. When I need light CSV parsing I sometimes use `parse_line` as above, but first I do `$string =~ s|'|\'|g;` works much more as expected! – Joel Berger Jan 22 '13 at 02:48
5

In response to how to do it with Text::CSV(_PP). Here is a quick one.

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV_PP;
my $parser = Text::CSV_PP->new();

my $string = "Paul,12,\"soccer,baseball,hockey\",white";

$parser->parse($string);
my @fields = $parser->fields();

print "$_\n" for @fields;

Normally one would install Text::CSV or Text::CSV_PP through the cpan utility.

To work around your not being able to install modules, I suggest you use the 'pure Perl' implementation so that you can 'install' it. The above example would work assuming you copied the text of Text::CSV_PP source into a file named CSV_PP.pm in a folder called Text created in the same directory as your script. You could also put it in some other location and use the use lib 'directory' method as discussed previously. See here and here to see other ways to get around install restriction using CPAN modules.

Community
  • 1
  • 1
Joel Berger
  • 20,180
  • 5
  • 49
  • 104
  • one more method (though a little more subversive) http://stackoverflow.com/questions/2980297/how-can-i-use-cpan-as-a-non-root-user – Joel Berger Feb 13 '11 at 17:43
2

Use this regex: m/("[^"]+"|[^,]+)(?:,\s*)?/g;

The above regular expression globally matches any word that starts with a comma or a quote and then matches the remaining word/words based on the starting character (comma or quote).

Here is a sample code and the corresponding output.

my $string = "Word1, Word2, \"Commas, inbetween\", Word3, \"Word4Quoted\", \"Again, commas, inbetween\"";
my @arglist = $string =~ m/("[^"]+"|[^,]+)(?:,\s*)?/g;
map { print $_ , "\n"} @arglist;

Here is the output:

Word1
Word2
"Commas, inbetween"
Word3
"Word4Quoted"
"Again, commas, inbetween"
0

try this

  @array=($string =~ /^([^,]*)[,]([^,]*)[,]["]([^"]*)["][,]([^']*)$/);

the array will contains the output which expected by you.

vaishali
  • 325
  • 2
  • 11
-1
use strict;
use warning;
#use Data::Dumper;

my $string = qq/Paul,12,"soccer,baseball,hockey",white/;

#split string into three parts
my ($st1, $st2, $st3) = split(/,"|",/, $string);
#output: st1:Paul,12 st2:soccer,baseball,hockey  st3:white  

#split $st1 into two parts
my ($st4, $st5) = split(/,/,$st1);

#push records into array
push (my @test,$st4, $st5,$st2, $st3 ) ;

#print Dumper \@test;
print "$test[2]\n";

output:

soccer,baseball,hockey 

#$VAR1 = [
#          'Paul',
#         '12',
#          'soccer,baseball,hockey',
#          'white'
#        ];
Nikhil Jain
  • 8,232
  • 2
  • 25
  • 47
-1

$string = "Paul,12,\"soccer,baseball,hockey\",white";

1 while($string =~ s#"(.?),(.?)"#\"$1aaa$2\"#g);

@array = map {$_ =~ s/aaa/ /g; $_ =~ s/\"//g; $_} split(/,/, $string);

$" = "\n";

print "$array[2]";

Purandaran
  • 74
  • 5
  • 1
    Eeeeeek. That's one heavily re-invented (and probably buggy, but who'd bother checking) bicycle. In other words, **please don't write CSV parsers by hand when Text::CSV(_XS) does the job perfectly and has all those freaked out edge and corner cases and bugs already ironed out.** – DVK Feb 13 '11 at 18:08