I have a lot of trouble understanding the basic rules of regex
and hope that someone could help explain them in "plain English".
$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)';
print "Enter a regular expression: ";
my $pattern = <STDIN>;
chomp($pattern);
if (/$pattern/) {
print "The text matches the pattern '$pattern'.\n";
print "\$1 is '$1'\n" if defined $1;
print "\$2 is '$2'\n" if defined $2;
print "\$3 is '$3'\n" if defined $3;
print "\$4 is '$4'\n" if defined $4;
print "\$5 is '$5'\n" if defined $5;
}
Three test outputs
Enter a regular expression: ([a-z]+)
The text matches the pattern '([a-z]+)'
$1 is 'silly'
Enter a regular expression: (\w+)
The text matches the pattern '(\w+)'
$1 is '1'
Enter a regular expression: ([a-z]+)(.*)([a-z]+)
The text matches the pattern '([a-z]+)(.*)([a-z]+)'
$1 is 'silly'
$2 is " sentence (495,a) *BUT* one which will be usefu'
$3 is 'l'
My confusion is as follows
doesn't
([a-z]+)
mean "a lower case alphabet and one/more repeats"? If so, shouldn't "will" be picked up as well? Unless it has something to do with () being about memory (i.e. "silly" being 5-letter word, so "will" will not picked up, but "willx" will ??)doesn't
(\w+)
mean "any word and one/more repeats"? If so, why is number "1" picked up as there is no repeat but a colon ":" afterwards?does
([a-z]+)(.*)([a-z]+)
mean "any lower case and repeat", immediately followed by "anything and 0 or more repeat", immediately followed by "any lower case and repeat"? If so, why does the output look like the one shown above?
I tried to look up online as much as I could but still fail to understand them. Any help will be greatly appreciated. Thank you.