tl;dr: This is what your code does, as a shell script:
#!/bin/bash
echo -en "Content-type:text/html\r\n\r\n"
ps -eo lstart,cmd | grep init | grep -v $QUERY_STRING | \
head -n 1 | awk '{ print $1" "$3" "$2" "$5" "$4}'
Now for the longer answer.
Rewriting the code
First, let's make that thing into C++ rather than C (like your tag suggests you're asking about) with a bit of error handling, then talk about what's going on:
#include <iostream>
#include <string>
#include <string_view>
int main () {
auto query_string = getenv("QUERY_STRING");
if (query_string == nullptr) {
std::cerr << "Couldn't obtain QUERY_STRING environment variable\n";
return EXIT_FAILURE;
}
if (std::string_view{query_string}.empty()) {
std::cerr << "Empty query string (QUERY_STRING environment variable)\n";
return EXIT_FAILURE;
}
std::stringstream command_line;
command_line
<< "ps -eo lstart,cmd | grep "
<< query_string
<< " | grep -v grep | head -n 1 | awk '{ print $1\" \"$3\" \"$2\" \"$5\" \"$4}'";
std::cout << "Content-type:text/html\r\n\r\n";
return system(command_line.str()); // security vulnerability, see below
}
What are we doing here?
So, we're creating a command-line here which we then execute using the system()
function. It's an invocation of a ps
command with some switches, followed by some text processing with grep
, head
and awk
- using the pipe mechanism to move the output of each command to the next. They key part is that we use the environment variable QUERY_STRING
to filter the ps
results, i.e. we list processes which match some phrase. If we compile this program, set the environment variable and run, this is what it looks like:
$ export QUERY_STRING=init
$ ./the_program
Content-type:text/html
Sun 3 Jun 2018 21:48:56
What this has given us is the start time of the first process whose command-line doesn't include the phrase "init". So now you can guess my system has been up since yesterday...
Finally, as a network guy, you probably realize the "Content-type" mumbo-jumbo and the double-newline is a MIME header, so this output is probably intended to be used as an HTTP response. Probably this is intended as some sort of CGI script.
Security vulnerabilities
- In the original code, the buffer size was arbitrarily limited to 1024 - while there was nothing limiting QUERY_SIZE not to be longer than that. If it is longer, you would have memory corruption, which can have security implications; and an attacker would likely be able to figure out your memory layout, so it's more dangerous. This is gone with the C++ version.
The second vulnerability has to do with the system
command. We're injecting an arbitrary string into the string we're creating; and there's nothing preventing someone from setting
$export QUERY_STRING="dummy; rm -rf $HOME ; echo"
in which case you would run:
ps -eo lstart,cmd | grep dummy; rm -rf $HOME ; echo | grep -v init | head -n 1 | awk '{ print $1" "$3" "$2" "$5" "$4}'
and this would delete everything under the effective user's home directory. Or it could be any command, including compilation of a custom C/C++ program to run some arbitrary code on your system. Very bad.
- Even if you sanitized QUERY_STRING to only be a valid grep pattern, there might still be a denial-of-service attack if someone were to supply complex, ultra-long grep'ing patterns somehow. So limiting the length is also a good idea.