0

I have sought the answer to this question here in stackoverflow but can't get acceptable results. (Sorry!)

I have a data file that looks like this:

share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST

... from which I need to extract the values inside double-quotes. In other words, I'd like to get something like this:

share,SHARE1,/path/to/some/share/,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST

The tricky part I've found is in trying to extract the data inside quotes. Suggestions made here have not worked for me, so I'm guessing I'm just doing it wrong. I also need to extract BOTH values from each line's double-quoted strings, not just the first one; I figure the remaining stuff could easily be parsed by splitting on whitespace.

In case it's relevant, I'm running this on a RHEL box and I need to pull it out with a regexp using Perl.

Thx!

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
tyoung13
  • 15
  • 2

6 Answers6

2

One option is to treat your data as a CSV file and use Text::CSV_XS to parse it, setting the separator character to a space:

use strict;
use warnings;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new( { binary => 1, sep_char => ' ' } )
  or die "Cannot use CSV: " . Text::CSV->error_diag();

open my $fh, "<:encoding(utf8)", "data.txt" or die "data.txt: $!";
while ( my $row = $csv->getline($fh) ) {
    print join ',', @$row;
    print "\n";
}
$csv->eof or $csv->error_diag();
close $fh;

Output on your dataset:

share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST

Hope this helps!

Kenosis
  • 6,196
  • 1
  • 16
  • 16
1

You can do this:

if literal quotes inside quotes are escaped with a backslash: share "SHA \" RE1" ...

$str =~ s/(?|"((?>[^"\\]++|\\{2}|\\.)*)"|()) /$1,/gs;

if literal quotes are escaped with an other quote: share "SHA "" RE1" ...

$str =~ s/(?|"((?>[^"]++|"")*)"|()) /$1,/g;

if you are absolutly sure that there is no escaped quote between quotes in all your data:

$str =~ s/(?|"([^"]*)"|()) /$1,/g;
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

Try this.

[^\" ]*

It selects every char but the quotation marks and the spaces.

Magnus
  • 1,550
  • 4
  • 14
  • 33
0

Not sure if I understand the question, you say one thing in the text but the example says something different, annyway, try this:

#!/usr/bin/env perl
use strict;
use warnings;

while (<DATA>) {
  chomp;
  my @matches = $_ =~ /"(.*?)"/g;
  print "@matches\n";
}

__DATA__
share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST

output:

$ ./p.pl 
SHARE1 /path/to/some/share
SHARE2 /path/to/a/different/share with spaces in the dir name
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
  • Pretty sure at the very least that regex should be `/"(.*?)"/g` or `/"([^"]*)"/g`, to avoid greedy matching... – Tim Pierce Dec 21 '13 at 22:18
0
my $str = 'share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST';
$str =~ s/"?\s*"\s*/,/g;
print $str;

This regex replaces like below:
"space" = ,
"space = ,
space" = ,
"" = ,

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
  • 1
    Thanks, added. My Perl's YAPE::Regex::Explain module is not installing on my pc. Otherwise I could have explained this in a better way. – Sabuj Hassan Dec 21 '13 at 21:47
  • 1
    @sabujhassan You mean something like [this](http://rick.measham.id.au/paste/explain.pl) :P ? – HamZa Dec 21 '13 at 21:48
  • 1
    @HamZa indeed. I should bookmark this until i install the module. Thanks a looooooot :D – Sabuj Hassan Dec 21 '13 at 21:51
0
#!/usr/bin/env perl
while(<>){  
    my @a = split /\s+\"|\"\s+/ , $_;      # split on any spaces + ", or any " + spaces
    for my $item ( @a ) {   
        if ( $item =~ /\"/ ) {          # if there's a quote, remove
            $item =~ s/\"//g;               
        } elsif ( $item !~ /\"/ ){      # else just replace spaces with comma
            $item =~ s/\s+/,/g; 
        }               
    }
    print join(",", @a);
    print "\n"; 
}

output:

share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST,
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST,

Leave it to you to remove the last comma :)

brianadams
  • 233
  • 1
  • 5