1
    $rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
    $rowfetch =~ m/(\w+), ?(.)/;
    printf $fh lc($2.$1);

I got help building this regular expression yesterday, but I don't fully understand it.

It takes a name like Parisi, Kenneth and prints out kparisi

Knowns:
s/ = substitute
m/ = match


I tried searching for the rest but couldn't find anything that really helped explain it.

I also didn't understand how the =~ is supposed to evaluate to either true or false, yet in this situation, it is modifying the string.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
CheeseConQueso
  • 5,831
  • 29
  • 93
  • 126
  • You should have gone with Konrad's solution (after I fixed it). That one was dead easy to understand. – Paul Tomblin Dec 19 '08 at 15:00
  • Oh i didnt know you had fixed it... I'll test it and thanks... Although Vinko's solution is working pretty good for me. I don't know if you saw the comment thread we had, but he helped get rid of some other chars in the strings. I'll let you know if yours works, thanks. – CheeseConQueso Dec 19 '08 at 15:04
  • it turns Parisi, Kenneth into kparisienneth
    $rowfetch =~ s/(\w+),\s(\w)/$2$1/;
    $rowfetch =~ s/([a-z]+)\s([a-z])/$2$1/i;
    $rowfetch = lc $rowfetch;
    – CheeseConQueso Dec 19 '08 at 15:16
  • nm the
    's i thought this took html... btw is there any way to send a message directly to someone on this site? im obviously new here
    – CheeseConQueso Dec 19 '08 at 15:16

9 Answers9

22

I find the YAPE::Regex::Explain module very helpful -

C:\>perl -e "use YAPE::Regex::Explain;print YAPE::Regex::Explain->new(qr/['-])->explain;"
The regular expression:

(?-imsx:['-])

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ['-]                     any character of: ''', '-'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------



C:\>perl -e "use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/(\w+), ?(.)/)->explain;"
The regular expression:

(?-imsx:(\w+), ?(.))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
   ?                       ' ' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

C:\>
Ed Guiness
  • 34,602
  • 16
  • 110
  • 145
  • whoooa hold on what the hay is all this? i appreciate the help, but this is just looking and reading weird... whats with all the ---------------------? – CheeseConQueso Dec 19 '08 at 15:00
  • nevermind... it just came up as pre code... last time i viewed it, it was regular formatted – CheeseConQueso Dec 19 '08 at 15:00
  • It's output from YAPE::Regex that will look better on your command line. The point is that there is a neat tool to help explain regex. – Ed Guiness Dec 19 '08 at 15:01
10

I keep one of these cheat sheets pinned on my cube wall for just such occasions. Google for regular expression cheat sheet to find others.

To add to what you already know:

  g -- search globally throughout the string
  + -- match at least one, but as many as possible
  ? -- match 0 or 1
  . -- match any character
 () -- group these together
  , -- a plain comma, no special meaning
 [] -- match any character inside the brackets
 \w -- match any word character

The magic is in the grouping -- the match expression uses the groups and puts them into variables $1 and $2. In this case $1 matches the word before the comma and $2 matches the first character following the whitespace after the comma.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
3

Download "The Regex Coach" and explore it. Consider purchasing "Mastering Regular Expressions" as it will walk you through this minefield. It is one of the best-typeset books I've ever seen and is deeply informative yet penetrable.

1

There is a great web front end to YAPE::Regex::Explain.

Here is the explanation of s/['-]//g

and for m/(\w+), ?(.)/

dawg
  • 98,345
  • 23
  • 131
  • 206
1

1st line: characters inside [] (' and -) are matched and replaced (s) by nothing, thus removed. /g means global and will try to match everything in the string.

2nd line: \w means a word character, + means more than once. ? means 0 or once. "." means anything. So it means find any word character found more than once, followed by a coma, followed by a space zero or once, followed by one of any character.

Loki
  • 29,950
  • 9
  • 48
  • 62
1
$lhs =~ s/foo/bar/g;

The s/ operator is a modifying regexp in Perl - you match the LHS against the first part on the right (foo). The second part specifies the replacement for the match in the first part (bar). So "Lafooey" goes to "Labarey".

In your question, the aim is to remove all ' and - like in "O'Hanlon" and "Chalmonly-Witherington-Smyth".

Then it matches "Lastname, First character of firstname". The parentheses put the values of these matches into the variables $1 and $2.

And prints the lowercase of "F" + "Lastname", because these are the values in $2 and $1.

At the end of it, you have a viable username for a system based upon the person's real name from a telephone directory style listing.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
JeeBee
  • 17,476
  • 5
  • 50
  • 60
1

iirc the =~ means make equal to the match (cf "~" alone returning true if matched)

annakata
  • 74,572
  • 17
  • 113
  • 180
1

The =~ matches the expression (string) on its left hand side against the regular expression on its right hand side, it does not modify the string. Asa side effect is set the variables $1, $2, ... to the bracketed parts matched.

In your case the first bracket will match "(\w+)" (word characters repeated one or more time, and the second will match "(.)" (the first letter of the given name. The " ?" expression will match an optional space.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
Diomidis Spinellis
  • 18,734
  • 5
  • 61
  • 83
1

Note that the given code fails miserably if the input isn't in the right format. Here's what I would do:

$rowfetch =~ s/[ '-]//g; #All chars inside the [ ] will be filtered out.
if($rowfetch =~ m/(\w+),([a-z])/i) {
    printf $fh lc($2.$1);
}

the $1-$9 positional variables hold the last successful match, but they are not reset in the case of failed matches. This means if the regex fails to match, $1 and $2 will not be erased and you'll end up with something other than what you wanted.

I've also altered the regex slightly. The first line also removes spaces. Since it appears that you are creating usernames or email addresses, you don't want spaces. The second line is stricter to ensure that $2 is a letter, and not some other character. The 'i' at the end tells perl to make all letter matches case insensitive. With it , I don't have to make that second part ([a-zA-Z]).