First, let's look at the statement that is giving you trouble:
die "The specified user contains illegal characters!"
unless($user =~/^\w+$/);
That is a another way of writing:
if ( $user !~ /^\w+$/ ) {
die "...";
}
What does the pattern mean?
^ the beginning of the string
\w+ one or more word characters
$ before an optional \n, and the end of the
string
So, the code will consider as valid usernames strings consisting of nothing but word characters and possibly a newline. There are two problems with that:
First, I doubt you intended to accept strings with a newline. The fix for that is easy: use \z
to mean end of the string unequivocally rather than $
.
Second, \w
matches a set considerably larger than just [A-Z_a-z0-9]
. Without other switches, it can match many other word characters in various languages. See **Word Characters in the most recent perlrecharclass
:
\w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_"). It does not match a whole word. To match a whole word, use \w+ . This isn't the same thing as matching an English word, but in the ASCII range it is the same as a string of Perl-identifier characters.
If the /a modifier is in effect ...
\w matches the 63 characters [a-zA-Z0-9_].
otherwise ...
For code points above 255 ...
\w matches the same as \p{Word} matches in this range. That is, it matches Thai letters, Greek letters, etc. This includes connector punctuation (like the underscore) which connect two words together, or diacritics, such as a COMBINING TILDE and the modifier letters, which are generally used to add auxiliary markings to letters.
For code points below 256 ...
if locale rules are in effect ...
\w matches the platform's native underscore character plus whatever the locale considers to be alphanumeric.
if Unicode rules are in effect or if on an EBCDIC platform ...
\w matches exactly what \p{Word} matches.
otherwise ...
\w matches [a-zA-Z0-9_].
So, until 5.14 gains wider acceptance, it is safest to say explicitly [a-z_A-Z0-9]
if those are the only characters you want to match.
$user=~/^\w+;\w$/
With the discussion above in mind, it should now be clear that
$user =~ /^\w+;\w$/
would match only input containing word characters, a
semi-colon, and a trailing word character and possibly a newline.
As to your code,
#!/usr/bin/perl
use CGI;
use CGI::Carp qw(fatalsToBrowser);
$q = new CGI;
First, you are missing
use strict;
use warnings;
Those pragma are not optional if you want to save yourself and possibly the rest of
the world some headaches.
Second, use CGI::Carp qw(fatalsToBrowser);
should only be used as a
short-term clutch if you do not have access to the web server logs.
Third,
$q = new CGI;
should be
my $q = CGI->new;
new CGI
is called indirect object notation and leaves you at the mercy of
perl
as to what your code ends up doing. CGI->new
unambiguously invokes
the new
method provided by CGI
. As an aside, I hate $q
or $query
as
names of variables holding CGI
objects. Just a simple $cgi
is more
meaningful.
Finally, looking at:
print $q->header,
$q->start_html('Finger User'),
$q->h1('Finger User'),
print "<pre>";
So, you print some HTML using the HTML generation methods provided by CGI
and some by hand. That hodge-podge style and some of the unwieldy tangled
messes one ends up putting in code is a good reason to avoid using the HTML
generation methods provided by CGI
.
Switch instead to CGI::Simple and
use a templating package such as
HTML::Template to separate code
from HTML content. Something along the lines of the following untested
script should work. Keep in mind that you can always test this by using one
of the two debug modes provided by CGI::Simple
:
#!/usr/bin/env perl
use strict;
use warnings;
use CGI::Simple;
use HTML::Template;
run();
sub run {
my $cgi = CGI::Simple->new;
my $tmpl = HTML::Template->new(filehandle => \*DATA);
my $user = $cgi->param('finger_user');
unless (defined $user) {
show_form($cgi, $tmpl);
return;
}
if (($user) = ($user =~ /^([A-Z_a-z0-9]{1,40})\z/)) {
show_output($cgi, $tmpl, $user);
}
else {
show_error($cgi, $tmpl, "Invalid user name");
}
return;
}
sub show_form {
my ($cgi, $tmpl) = @_;
$tmpl->param(FORM => 1);
print $cgi->header(
-type => 'text/html',
-charset => 'utf-8',
), $tmpl->output;
return;
}
sub show_error {
my ($cgi, $tmpl, $msg) = @_;
$tmpl->param(ERRORMSG => $msg);
print $cgi->header(
-type => 'text/html',
-charset => 'utf-8',
), $tmpl->output;
return;
}
sub show_output {
my ($cgi, $tmpl, $user) = @_;
$tmpl->param(
USER => $user,
OUTPUT => scalar `finger -s $user`,
);
print $cgi->header(
-type => 'text/html',
-charset => 'utf-8',
), $tmpl->output;
return;
}
__DATA__
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>finger
<TMPL_IF USER>
<TMPL_VAR USER>
<TMPL_ELSE>
a user
</TMPL_IF>
on our system</title>
</head>
<body>
<TMPL_IF ERRORMSG>
<p syle="color:#e11"><TMPL_VAR ERRORMSG></p>
</TMPL_IF>
<TMPL_IF OUTPUT>
<h1>finger <TMPL_VAR USER></h1>
<pre><TMPL_VAR OUTPUT></pre>
</TMPL_IF>
<TMPL_IF FORM>
<form id="finger_form" name="finger_form" method="GET">
<p><label for="finger_user"><input id="finger_user" name="finger_user" type="text"
size="51"><input type="submit" value="finger" id="finger_submit"
name="finger_submit"></p>
</form>
</TMPL_IF>
</body>
</html>