Need help understing this short C++ program and its vulnerability

Question

I would love if someone could explain me exactly what the code does. I know there's a buffer overflow and bash command execution vulnerability - but since I'm a network guy and not a programmer, I really could use some help to understand the entire code. Thanks in advance!

int main () {
    int status;
    char t[1024]="ps -eo lstart,cmd | grep ";
    cout << "Content-type:text/html\r\n\r\n"<<endl;
    char *value = getenv("QUERY_STRING");
    strcat(t,value);
    status = system(strcat(t," | grep -v grep | head -n 1 | awk '{ print $1\" \"$3\" \"$2\" \"$5\" \"$4}'"));

    return 0;
}

You can make it C code by replacing `cout` with the corresponding `printf()` or `puts()` call. (Everything else can be done in C.) Btw. you should add the necessary `#include`s to make this sample complete. — Scheff's Cat, Jun 04 '18 at 16:34
As for what the code does, well... it does stuff that would be best left to a shell script. — Christian Gibbons, Jun 04 '18 at 16:37
What the code does is 1) append the contents of the environment variable $QUERY_STRING to the string, `t`, 2) append everything of the second `strcat()` 3) executes the command pipline generated that way. Problems: 1) buffer overflow if the contents of $QUERY_STRING is too big so that everything together exceeds 1023 bytes 2) $QUERY_STRING could content 'evil' commands that would be executed — Ingo Leonhardt, Jun 04 '18 at 16:54
@IngoLeonhardt: since the contents of QUERY_STRING are not quoted in the command, even a little whitespace would be an issue. — rici, Jun 04 '18 at 16:58

einpoklum · Accepted Answer · 2018-06-04T21:52:05.357

tl;dr: This is what your code does, as a shell script:

#!/bin/bash
echo -en "Content-type:text/html\r\n\r\n"
ps -eo lstart,cmd | grep init | grep -v $QUERY_STRING | \
head -n 1 | awk '{ print $1" "$3" "$2" "$5" "$4}'

Now for the longer answer.

Rewriting the code

First, let's make that thing into C++ rather than C (like your tag suggests you're asking about) with a bit of error handling, then talk about what's going on:

#include <iostream>
#include <string>
#include <string_view>

int main () {
    auto query_string = getenv("QUERY_STRING");
    if (query_string == nullptr) {
        std::cerr << "Couldn't obtain QUERY_STRING environment variable\n";
        return EXIT_FAILURE;
    }
    if (std::string_view{query_string}.empty()) {
        std::cerr << "Empty query string (QUERY_STRING environment variable)\n";
        return EXIT_FAILURE;
    }
    std::stringstream command_line;
    command_line 
        << "ps -eo lstart,cmd | grep "
        << query_string 
        << " | grep -v grep | head -n 1 | awk '{ print $1\" \"$3\" \"$2\" \"$5\" \"$4}'";
    std::cout << "Content-type:text/html\r\n\r\n";
    return system(command_line.str()); // security vulnerability, see below
}

What are we doing here?

So, we're creating a command-line here which we then execute using the system() function. It's an invocation of a ps command with some switches, followed by some text processing with grep, head and awk - using the pipe mechanism to move the output of each command to the next. They key part is that we use the environment variable QUERY_STRING to filter the ps results, i.e. we list processes which match some phrase. If we compile this program, set the environment variable and run, this is what it looks like:

$ export QUERY_STRING=init
$ ./the_program
Content-type:text/html


Sun 3 Jun 2018 21:48:56

What this has given us is the start time of the first process whose command-line doesn't include the phrase "init". So now you can guess my system has been up since yesterday...

Finally, as a network guy, you probably realize the "Content-type" mumbo-jumbo and the double-newline is a MIME header, so this output is probably intended to be used as an HTTP response. Probably this is intended as some sort of CGI script.

Security vulnerabilities

In the original code, the buffer size was arbitrarily limited to 1024 - while there was nothing limiting QUERY_SIZE not to be longer than that. If it is longer, you would have memory corruption, which can have security implications; and an attacker would likely be able to figure out your memory layout, so it's more dangerous. This is gone with the C++ version.
The second vulnerability has to do with the system command. We're injecting an arbitrary string into the string we're creating; and there's nothing preventing someone from setting
```
$export QUERY_STRING="dummy; rm -rf $HOME ; echo"
```
in which case you would run:
```
ps -eo lstart,cmd | grep dummy; rm -rf $HOME ; echo | grep -v init | head -n 1 | awk '{ print $1" "$3" "$2" "$5" "$4}'
```
and this would delete everything under the effective user's home directory. Or it could be any command, including compilation of a custom C/C++ program to run some arbitrary code on your system. Very bad.
Even if you sanitized QUERY_STRING to only be a valid grep pattern, there might still be a denial-of-service attack if someone were to supply complex, ultra-long grep'ing patterns somehow. So limiting the length is also a good idea.

I wouldn't call that code a "proper C++" -- it is considerably less efficient and harder on eyes than original. Plus, it still contains injection vulnerability. — C.M., Jun 04 '18 at 19:01
@C.M.: Point taken about the vulnerability, but I did mark it as such. As for efficiency - this is not a consideration here. Do you mean the construction of several strings? You're right, I could have used a `std::stringstream`, but I thought this would be simpler. Changed now. — einpoklum, Jun 04 '18 at 19:15
I was protesting use of "proper" noun here -- since new code is more verbose than old one, it makes an impression that "proper C++" code is more verbose than C. Which [isn't necessarily the case](https://godbolt.org/g/cwVn6F). I would also object the notion that C++ means "use iostreams for i/o" -- there is nothing wrong with using cstdlib/etc in C++ code, in fact in my experience iostreams lib creates more problems than it solves. — C.M., Jun 04 '18 at 21:45
@C.M.: Well, making the old code do what the new code does would make it even more verbose still; but then, I don't mind more verbosity if it's not terrible and redundant, and if it allows for clarify. Also - the old code was using iostreams for I/O already - that's not something I added. I'm not particularly fond of iostreams either. — einpoklum, Jun 04 '18 at 21:49

score 0 · Answer 2 · answered Jun 04 '18 at 17:13

This just declares the variable status

int status;

This declared t, with the C-String "ps -eo lstart,cmd | grep ". t has a maximum of 1024 bytes in capacity (1023 for string + 1 byte for \0)

char t[1024]="ps -eo lstart,cmd | grep ";

Print the string below

cout << "Content-type:text/html\r\n\r\n"<<endl;

get environment variable QUERY_STRING

char *value = getenv("QUERY_STRING");

concatenate the value above to t. Here you may have the buffer overflow because you don't know the size of the string in value and it may be longer than 1024 bytes

strcat(t,value);

Invoke the system() function for another concatenation of the previous t with all these grep, etc, etc commands... here you could have another buffer overflow, once t may overflow its 1024 bytes here as well.

status = system(strcat(t," | grep -v grep | head -n 1 | awk '{ print $1\" \"$3\" \"$2\" \"$5\" \"$4}'"));

Need help understing this short C++ program and its vulnerability

2 Answers2

Rewriting the code

What are we doing here?

Security vulnerabilities