Please explain this Perl regular expression

Question

    $rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
    $rowfetch =~ m/(\w+), ?(.)/;
    printf $fh lc($2.$1);

I got help building this regular expression yesterday, but I don't fully understand it.

It takes a name like Parisi, Kenneth and prints out kparisi

Knowns:
s/ = substitute
m/ = match

I tried searching for the rest but couldn't find anything that really helped explain it.

I also didn't understand how the =~ is supposed to evaluate to either true or false, yet in this situation, it is modifying the string.

You should have gone with Konrad's solution (after I fixed it). That one was dead easy to understand. — Paul Tomblin, Dec 19 '08 at 15:00
Oh i didnt know you had fixed it... I'll test it and thanks... Although Vinko's solution is working pretty good for me. I don't know if you saw the comment thread we had, but he helped get rid of some other chars in the strings. I'll let you know if yours works, thanks. — CheeseConQueso, Dec 19 '08 at 15:04
it turns Parisi, Kenneth into kparisienneth
$rowfetch =~ s/(\w+),\s(\w)/$2$1/;
$rowfetch =~ s/([a-z]+)\s([a-z])/$2$1/i;
$rowfetch = lc $rowfetch; — CheeseConQueso, Dec 19 '08 at 15:16
nm the
's i thought this took html... btw is there any way to send a message directly to someone on this site? im obviously new here — CheeseConQueso, Dec 19 '08 at 15:16

Ed Guiness · Answer 1 · 2009-03-11T12:59:44.000

I find the YAPE::Regex::Explain module very helpful -

C:\>perl -e "use YAPE::Regex::Explain;print YAPE::Regex::Explain->new(qr/['-])->explain;"
The regular expression:

(?-imsx:['-])

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ['-]                     any character of: ''', '-'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------



C:\>perl -e "use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/(\w+), ?(.)/)->explain;"
The regular expression:

(?-imsx:(\w+), ?(.))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
   ?                       ' ' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

C:\>

whoooa hold on what the hay is all this? i appreciate the help, but this is just looking and reading weird... whats with all the ---------------------? — CheeseConQueso, Dec 19 '08 at 15:00
nevermind... it just came up as pre code... last time i viewed it, it was regular formatted — CheeseConQueso, Dec 19 '08 at 15:00
It's output from YAPE::Regex that will look better on your command line. The point is that there is a neat tool to help explain regex. — Ed Guiness, Dec 19 '08 at 15:01

tvanfosson · Accepted Answer · 2008-12-19T15:08:04.047

10

I keep one of these cheat sheets pinned on my cube wall for just such occasions. Google for regular expression cheat sheet to find others.

To add to what you already know:

  g -- search globally throughout the string
  + -- match at least one, but as many as possible
  ? -- match 0 or 1
  . -- match any character
 () -- group these together
  , -- a plain comma, no special meaning
 [] -- match any character inside the brackets
 \w -- match any word character

The magic is in the grouping -- the match expression uses the groups and puts them into variables $1 and $2. In this case $1 matches the word before the comma and $2 matches the first character following the whitespace after the comma.

edited Dec 19 '08 at 15:08

answered Dec 19 '08 at 14:59

tvanfosson

524,688
99
697
795

yeah, i promptly removed that from my "knowns" when i found out haha - foolish – CheeseConQueso Dec 19 '08 at 15:07
just a small addition, the whitespace after the comma is optional (due to the ?) – Dashogun Dec 19 '08 at 15:29
@Dashogun. Correct, but his example has the whitespace in it. – tvanfosson Dec 19 '08 at 17:48

score 3 · Answer 3 · answered Dec 22 '08 at 01:17

Download "The Regex Coach" and explore it. Consider purchasing "Mastering Regular Expressions" as it will walk you through this minefield. It is one of the best-typeset books I've ever seen and is deeply informative yet penetrable.

score 1 · Answer 4 · answered Apr 13 '10 at 17:23

1

There is a great web front end to YAPE::Regex::Explain.

Here is the explanation of s/['-]//g

and for m/(\w+), ?(.)/

answered Apr 13 '10 at 17:23

dawg

98,345
23
131
206

score 1 · Answer 5 · answered Dec 19 '08 at 14:58

1st line: characters inside [] (' and -) are matched and replaced (s) by nothing, thus removed. /g means global and will try to match everything in the string.

2nd line: \w means a word character, + means more than once. ? means 0 or once. "." means anything. So it means find any word character found more than once, followed by a coma, followed by a space zero or once, followed by one of any character.

score 1 · Answer 6 · edited Jul 21 '09 at 02:51

$lhs =~ s/foo/bar/g;

The s/ operator is a modifying regexp in Perl - you match the LHS against the first part on the right (foo). The second part specifies the replacement for the match in the first part (bar). So "Lafooey" goes to "Labarey".

In your question, the aim is to remove all ' and - like in "O'Hanlon" and "Chalmonly-Witherington-Smyth".

Then it matches "Lastname, First character of firstname". The parentheses put the values of these matches into the variables $1 and $2.

And prints the lowercase of "F" + "Lastname", because these are the values in $2 and $1.

At the end of it, you have a viable username for a system based upon the person's real name from a telephone directory style listing.

score 1 · Answer 7 · answered Dec 19 '08 at 15:00

1

iirc the =~ means make equal to the match (cf "~" alone returning true if matched)

answered Dec 19 '08 at 15:00

annakata

74,572
17
113
180

score 1 · Answer 8 · edited Jul 21 '09 at 02:49

The =~ matches the expression (string) on its left hand side against the regular expression on its right hand side, it does not modify the string. Asa side effect is set the variables $1, $2, ... to the bracketed parts matched.

In your case the first bracket will match "(\w+)" (word characters repeated one or more time, and the second will match "(.)" (the first letter of the given name. The " ?" expression will match an optional space.

score 1 · Answer 9 · answered Jan 16 '09 at 21:49

Note that the given code fails miserably if the input isn't in the right format. Here's what I would do:

$rowfetch =~ s/[ '-]//g; #All chars inside the [ ] will be filtered out.
if($rowfetch =~ m/(\w+),([a-z])/i) {
    printf $fh lc($2.$1);
}

the $1-$9 positional variables hold the last successful match, but they are not reset in the case of failed matches. This means if the regex fails to match, $1 and $2 will not be erased and you'll end up with something other than what you wanted.

I've also altered the regex slightly. The first line also removes spaces. Since it appears that you are creating usernames or email addresses, you don't want spaces. The second line is stricter to ensure that $2 is a letter, and not some other character. The 'i' at the end tells perl to make all letter matches case insensitive. With it , I don't have to make that second part ([a-zA-Z]).

Please explain this Perl regular expression

9 Answers9

Linked