1

I am new to both Perl and complicated regular expressions. I mean I've used the * from regular expressions before but nothing more complicated than that. In the script below I know that there is a very big security hole in which perl code can be injected and ran so that any command can be executed even a shell. In trying to stop this injection, I've come to realize that regular expressions are much harder than I thought. The book I am using says to use the combination of

die "The specified user contains illegal characters!"
      unless($user =~/^\w+$/);

I'm fairly certain that this means that the input from the user must begin with more than one word but I'm not sure how this stops a command from being injected because it never checks for a semicolon. I think that the unless clause should be more like

unless($user=~/^\w+;\w$/);

However, neither seem to work. Any help with this would be awesome because I would really like to understand this. Thanks!

#!/usr/bin/perl

use CGI;
use CGI::Carp qw(fatalsToBrowser);
$q = new CGI;

print $q->header,
    $q->start_html('Finger User'),
    $q->h1('Finger User'),
print "<pre>";

$user = $q->param("user");

#die "the specified user contains illegal characters!"
#   unless ($user =~ /ls/);
if (!($user =~ /^\w*;\w*$/)){
    print `/usr/bin/finger -s $user`;
}

print "</pre>";
print $q->end_html;
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
tpar44
  • 1,431
  • 4
  • 22
  • 35
  • 2
    Not "more than one word". The way to read that line is "die unless `$user` is made up exclusively of the characters A-Z, a-z, 0-9, and the underscore. – Ray Toal Nov 12 '11 at 03:19
  • 1
    as Ray Toal says \w matches alphanumeric and '_'. When you say it doesnt work, what is the user name that you are trying? – SAN Nov 12 '11 at 03:24
  • It makes sense if you think of "unless" as the opposite of "if". – mob Nov 12 '11 at 03:42
  • 1
    `\w` matches waaayyy more than `[A-Z_a-z0-9]`. – Sinan Ünür Nov 12 '11 at 03:50
  • @Aki I am trying joeshmo; ls -l; I believe that it should not be allowed since the ';' is not a word character but it is going through anyway. – tpar44 Nov 12 '11 at 18:11
  • yes, that should not match the pattern `/^\w+$/` i am hoping you will find the real issue if you follow the suggestions like `use warnings` by Sinan. There is a great post about [debugging cgi scripts](http://stackoverflow.com/questions/2165022/how-can-i-troubleshoot-my-perl-cgi-script) – SAN Nov 12 '11 at 19:33

3 Answers3

5

First, let's look at the statement that is giving you trouble:

die "The specified user contains illegal characters!"
      unless($user =~/^\w+$/);

That is a another way of writing:

 if ( $user !~ /^\w+$/ ) {
     die "...";
 }

What does the pattern mean?

  ^                        the beginning of the string
 \w+                       one or more word characters
  $                        before an optional \n, and the end of the
                           string

So, the code will consider as valid usernames strings consisting of nothing but word characters and possibly a newline. There are two problems with that:

First, I doubt you intended to accept strings with a newline. The fix for that is easy: use \z to mean end of the string unequivocally rather than $.

Second, \w matches a set considerably larger than just [A-Z_a-z0-9]. Without other switches, it can match many other word characters in various languages. See **Word Characters in the most recent perlrecharclass:

\w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_"). It does not match a whole word. To match a whole word, use \w+ . This isn't the same thing as matching an English word, but in the ASCII range it is the same as a string of Perl-identifier characters.

    If the /a modifier is in effect ...

    \w matches the 63 characters [a-zA-Z0-9_].
    otherwise ...
        For code points above 255 ...

        \w matches the same as \p{Word} matches in this range. That is, it matches Thai letters, Greek letters, etc. This includes connector punctuation (like the underscore) which connect two words together, or diacritics, such as a COMBINING TILDE and the modifier letters, which are generally used to add auxiliary markings to letters.
        For code points below 256 ...
            if locale rules are in effect ...

            \w matches the platform's native underscore character plus whatever the locale considers to be alphanumeric.
            if Unicode rules are in effect or if on an EBCDIC platform ...

            \w matches exactly what \p{Word} matches.
            otherwise ...

            \w matches [a-zA-Z0-9_].

So, until 5.14 gains wider acceptance, it is safest to say explicitly [a-z_A-Z0-9] if those are the only characters you want to match.

$user=~/^\w+;\w$/

With the discussion above in mind, it should now be clear that $user =~ /^\w+;\w$/ would match only input containing word characters, a semi-colon, and a trailing word character and possibly a newline.

As to your code,

#!/usr/bin/perl

use CGI;
use CGI::Carp qw(fatalsToBrowser);
$q = new CGI;

First, you are missing

use strict; 
use warnings;

Those pragma are not optional if you want to save yourself and possibly the rest of the world some headaches.

Second, use CGI::Carp qw(fatalsToBrowser); should only be used as a short-term clutch if you do not have access to the web server logs.

Third,

$q = new CGI;

should be

my $q = CGI->new;

new CGI is called indirect object notation and leaves you at the mercy of perl as to what your code ends up doing. CGI->new unambiguously invokes the new method provided by CGI. As an aside, I hate $q or $query as names of variables holding CGI objects. Just a simple $cgi is more meaningful.

Finally, looking at:

print $q->header,
    $q->start_html('Finger User'),
    $q->h1('Finger User'),
print "<pre>";

So, you print some HTML using the HTML generation methods provided by CGI and some by hand. That hodge-podge style and some of the unwieldy tangled messes one ends up putting in code is a good reason to avoid using the HTML generation methods provided by CGI.

Switch instead to CGI::Simple and use a templating package such as HTML::Template to separate code from HTML content. Something along the lines of the following untested script should work. Keep in mind that you can always test this by using one of the two debug modes provided by CGI::Simple:

#!/usr/bin/env perl

use strict;
use warnings;

use CGI::Simple;
use HTML::Template;

run();

sub run {
    my $cgi = CGI::Simple->new;
    my $tmpl = HTML::Template->new(filehandle => \*DATA);

    my $user = $cgi->param('finger_user');

    unless (defined $user) {
        show_form($cgi, $tmpl);
        return;
    }

    if (($user) = ($user =~ /^([A-Z_a-z0-9]{1,40})\z/)) {
        show_output($cgi, $tmpl, $user);
    }
    else {
        show_error($cgi, $tmpl, "Invalid user name");
    }

    return;
}

sub show_form {
    my ($cgi, $tmpl) = @_;

    $tmpl->param(FORM => 1);

    print $cgi->header(
        -type    => 'text/html',
        -charset => 'utf-8',
    ), $tmpl->output;

    return;
}

sub show_error {
    my ($cgi, $tmpl, $msg) = @_;

    $tmpl->param(ERRORMSG => $msg);

    print $cgi->header(
        -type    => 'text/html',
        -charset => 'utf-8',
    ), $tmpl->output;

    return;
}

sub show_output {
    my ($cgi, $tmpl, $user) = @_;

    $tmpl->param(
        USER => $user,
        OUTPUT => scalar `finger -s $user`,
    );


    print $cgi->header(
        -type    => 'text/html',
        -charset => 'utf-8',
    ), $tmpl->output;

    return;
}


__DATA__
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>finger
<TMPL_IF USER>
<TMPL_VAR USER>
<TMPL_ELSE>
a user
</TMPL_IF>
on our system</title>
</head>

<body>

<TMPL_IF ERRORMSG>
<p syle="color:#e11"><TMPL_VAR ERRORMSG></p>
</TMPL_IF>

<TMPL_IF OUTPUT>
<h1>finger <TMPL_VAR USER></h1>
<pre><TMPL_VAR OUTPUT></pre>
</TMPL_IF>

<TMPL_IF FORM>
<form id="finger_form" name="finger_form" method="GET">
<p><label for="finger_user"><input id="finger_user" name="finger_user" type="text"
size="51"><input type="submit" value="finger" id="finger_submit"
name="finger_submit"></p>
</form>
</TMPL_IF>

</body>
</html>
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
2

Another important point here. As written your code will allow anything other than two alphanumeric sequences separated by a single semicolon. For example alice; echo "cracked!"; bob is a perfectly valid input to your program as written, because it contains two semicolons and some other non-alphanumeric characters.

The general principle here is that you should generally test for, and only accept, "good" input rather than reject "bad" input. Here is one of many good articles on the topic.

Dan
  • 10,990
  • 7
  • 51
  • 80
1

\w matches a single character, not a word. It is one of [A-Za-z0-9_] in ASCII case.

\w+ matches one or more of the above characters e.g., a_b0c.

^ and $ make sure that there is nothing else in the $user string.

So $user =~ /^\w+$/ is true if $user contains only alphanumerical characters and underscore and nothing else. The program dies if the condition is false.

$ can also match before newline at end of string. If $user might end with a newline and you'd like to reject such cases then you could use \z instead of $. \z matches only at end of string.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • `\w` matches waaaayyyyyyyyyyyy more than just `[A-Z_a-z0-9]` and `$` can match `\n`. – Sinan Ünür Nov 12 '11 at 05:25
  • 1
    @Sinan why dont you explain what else it matches, i am curious to know – SAN Nov 12 '11 at 05:54
  • 1
    @Sinan Ünür: What does it match in ascii case other than [A-Za-z0-9_]? What does it match in general other than alphanumerical characters and underscore? – jfs Nov 12 '11 at 05:59
  • @Sinan Ünür: `$` doesn't match `\n` in this case. There are any flags set for the regex e.g., `m` flag is not set. – jfs Nov 12 '11 at 06:01
  • `perl -le "$x=qq(hellō_world\n); print q(matches) if $x =~ /^\w+$/"` – Jim Davis Nov 12 '11 at 07:23
  • My above comment shows "matches" in a Windows XP command prompt with ActiveState perl 5.14.2. Even without the unicode, it does seem to show that `/^\w+$/` matches a string ending with a newline. – Jim Davis Nov 12 '11 at 07:25
  • @Jim Davis: It is unrelated to Unicode. [`$` Match the end of the line (or *before newline at the end*) ... Embedded newlines will not be matched by "^" or "$"](http://perldoc.perl.org/perlre.html). Here's [an example](http://ideone.com/vjB3X) – jfs Nov 12 '11 at 08:03
  • I totally brainfarted on the `$` disagreement and read it backwards. – Jim Davis Nov 12 '11 at 13:41
  • 1
    I should have said `$user =~ /^\w+$/` would match even when `$user` does not consist entirely of word characters: i.e., when there is a trailing newline. The *intent* of the programmer to is to only let word characters through. Therefore, `\z` should be used instead of `$` to anchor the pattern. There is also the potential problem that `\w` does not mean what you seem to think it means. – Sinan Ünür Nov 12 '11 at 23:06
  • @Sinan Ünür: You're right about trailing newline. I've updated the answer correspondingly. I've made *ASCII* more prominent in the text to avoid possible confusion with `\w` meaning. – jfs Nov 13 '11 at 04:59