0

I am trying to write a perl script that get all strings that is does not start and end with a single quote. And a string cannot be a part of comment # and each line in DATA is not necessary at the beginning of a line.

use warnings;
use strict;


my $file; 
{ 
local $/ = undef; 
$file = <DATA>; 
};
my @strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",@strings);

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";

I am getting no where with this regex. The expected output is

"This is a string2"
"This is comment syntax #"
"This is string 4"  
user2763829
  • 793
  • 3
  • 10
  • 20

2 Answers2

1

Obviously this is only an exercise, as there are been many students asking about this problem lately. Regex's will only ever get you part of the way there, as there will pretty much always be edge cases.

The following code is probably good enough for your purposes, but it doesn't even successfully parse itself because of quotes inside a qr{}. You'll have to figure out how to get strings that span lines to work on your own:

use strict;
use warnings;

my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;

my $data = do { local $/; <DATA> };

while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
    my $match = $1;

    if ($match =~ /^#/) {
        print "Comment - $match\n";

    } elsif ($match =~ /^"/) {
        print "Double quote - $match\n";

    } elsif ($match =~ /^'/) {
        print "Single quote - $match\n";

    } else {
        die "Carp!  something went wrong!  <$match>";
    }
}

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Miller
  • 34,962
  • 4
  • 39
  • 60
  • This a extra question from the Lab question in perl regex. The lab assistant were actually busy help out other with the Lab question.. I was wondering if it is possible to achieve it using regex only without using if else statement – user2763829 Apr 01 '14 at 01:10
  • Lab question in perl regex? Please explain. – Miller Apr 01 '14 at 01:12
  • The lab questions are to practice the materials covered in lecture and the lecture have only covered the basics of regex and not even ?: ?<. I have complete all the questions for the Lab except that there is a extra question/challenge question. As the lab questions are not marked I will never know the solution if I can't figure out – user2763829 Apr 01 '14 at 01:24
0

Do not know how to achieve that by using regular expression, so here is a simple hand-written lexer:

#!/usr/bin/perl

use strict;
use warnings;

sub extract_string {
    my @buf = split //, shift;

    while (my $peer = shift @buf) {
        if ($peer eq '"') {
            my $str = "$peer";
            while ($peer = shift @buf) {
                $str .= "$peer";
                last if $peer eq '"';
            }
            if ($peer) {
                return ($str, join '', @buf);
            }
            else {
                return ("", "");
            }
        }
        elsif ($peer eq '#') {
            return ("", "");
        }
    }
}

my ($str, $buf);

while ($buf = <DATA>) {
    chomp $buf;
    while (1) {
        ($str, $buf) = extract_string $buf;
        print "$str\n" if $str;
        last unless $buf;
    }
}

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";

Another option is using Perl module such as PPI.

Lee Duhem
  • 14,695
  • 3
  • 29
  • 47