13

I would like to rename all variables within the file to random name.

For example this:

$example = "some $string";
function ($variable2) {
    echo $variable2;
}
foreach ($variable3 as $key => $var3val) {
    echo $var3val . "somestring";
}

Will become this:

$frk43r = "some $string";
function ($izi34ee) {
    echo $izi34ee;
}
foreach ($erew7er as $iure7 => $er3k2) {
    echo $er3k2 . "some$string";
}  

It doesn't look so easy task so any suggestions will be helpful.

joan16v
  • 5,055
  • 4
  • 49
  • 49
JohnyFree
  • 1,319
  • 3
  • 22
  • 35
  • 1
    So, you want to use php to modify php? Mind if I ask why? This whole process seems illogical. – Epodax Sep 04 '15 at 08:17
  • There are plenty of obfuscators available online to use, though I have to wonder what your reasoning is for thinking you need one. – Jonnix Sep 04 '15 at 08:18
  • And frankly, use of `$variable1`, `$variable2` and `$variable3` is probably doing just as good a job... – Jonnix Sep 04 '15 at 08:20
  • 2
    If you want to stand a chance at changing varnames reliably (ie without changing the way the code works), you really should look at _parsing_ the code: nikic (contributor to the PHP source) wrote a PHP parser in php, you can get it [here, on github](https://github.com/nikic/PHP-Parser) – Elias Van Ootegem Sep 04 '15 at 08:27
  • @JonStirling to prevent licence violations from beginner developers. Yes I know that code obfuscator doesn't give security. But it still prevents 90% of cases, so why not using it. Most customers who see the code is obfuscated will prefer to buy the product than investing time to break the code. We used paid obfuscators. The latest one just stopped working. We would also like to implement some own methods, this is why we are creating own obfuscator. But this is not the question I asked, my question is about renaming variable. – JohnyFree Sep 04 '15 at 08:31
  • @JohnyFree ... the downvote wasn't me... – Jonnix Sep 04 '15 at 08:35
  • I apologize than for that. – JohnyFree Sep 04 '15 at 08:36
  • 1
    you could use regular expressions to identify all variable names and put them in an array. Then delete double entries in this array. At last you could do a str_replace() for each unique entry in this array, replacing by a random string... edit: of course you need to parse the php-file for this task, as Elias Van Ootegem already wrote – SaschaP Sep 04 '15 at 08:41
  • @SaschaP: Of course, a regex will get you nowhere if the code you're processing uses things like [`get_defined_vars`](http://php.net/manual/en/function.get-defined-vars.php), or worse: `$GLOBALS['varname']`, not to mention variable variables (`$var1 = 'varname'; echo $$var1;`) and functions using global variables (defined in another file) using `global $foobar;`... there's just too much to take into account, regex's won't work – Elias Van Ootegem Sep 04 '15 at 10:03
  • I think I will use function token_get_all() to get all variables. Then I just need to use foreach and if $token[0] == T_VARIABLE inside foreach. I will publish solution once done. – JohnyFree Sep 04 '15 at 10:09
  • @JohnyFree: Read my last comment: `token_get_all` will *not* handle dynamic stuff like variable variables, usage of `get_defined_vars`, using variables in multiple file names (`global $varname;`), super-globals (`$GLOBALS['varname'];` -> the key has to be updated to hold the new variable name, too). – Elias Van Ootegem Sep 04 '15 at 10:13
  • Fortunately I don't have such scenarios because I wrote the code. – JohnyFree Sep 04 '15 at 10:23
  • @EliasVanOotegem yes, you are right that regex will not cover all possible variable declarations. I gave the comment because I think my solution would be possible with the given example of the questioner. BTW: your example with variable variables would also work with regex: `$var1 = 'varname'; echo $$var1;`. If you now replace '`$var1`' by '`$r2d2`' the code would be still valid... – SaschaP Sep 04 '15 at 10:41
  • 1
    @SaschaP: You'd also have to change the right hand operand (string constant `'varname'` to whatever the new variable name of `$varname` is. Of course, the value of ` $var1` might be a string returned by a function (`$var1 = $this->getPropertyName(); return $this->{$var1};`, so `getPropertyName` has to return the random string... that's where things get really tricky. – Elias Van Ootegem Sep 04 '15 at 10:44
  • @EliasVanOotegem Oh! now I see the point! Ok, so the variable variables scenario won't work with my solution... – SaschaP Sep 04 '15 at 10:47
  • 1
    @SaschaP and to the OP: if you can come up with a solution that can reliable handle [code like this](https://eval.in/427976), I'll award a bounty to your answer, because that would be rather impressive ;) – Elias Van Ootegem Sep 04 '15 at 10:49
  • @EliasVanOotegem Ok, i've got the code now. It can handle exactly your given code except the part ` = getVarname()` but I'm working on it ;-) I confess my code should be improved for more varying code, but it works mostly :-) How can I send it to you or where can I post it? – SaschaP Sep 04 '15 at 12:52
  • @SaschaP: Well, you can post it here as an answer. Mind you: if I move the function definition to another file, would that brake your solution, or can it cope with that? – Elias Van Ootegem Sep 04 '15 at 13:18
  • @SaschaP: you could also post the code on gist or something – Elias Van Ootegem Sep 04 '15 at 13:20
  • @EliasVanOotegem I'm currently working on a solution to determine the function which get called and then to change the return value. So I think it could cope with that. I'll post my code as answer now – SaschaP Sep 04 '15 at 13:21
  • @SaschaP: Note that the way a function returns (ie where it gets its data from) might not be a hard-coded string, it's also worth checking that you're not messing up too many string constants. [run your solution against this](https://eval.in/428143) – Elias Van Ootegem Sep 04 '15 at 13:26
  • @EliasVanOotegem I will edit my code when I've considered your cases! – SaschaP Sep 04 '15 at 13:30
  • Ever thought about compiling the code? Zend offers a compiler by themselves, but others also do. We used ionCube for one project and it did its job very well. Besides from protecting our code, it even gave us a significant performance boost. – D. E. Dec 02 '16 at 13:54
  • made an edit to my answer. please have a look at it – SaschaP Dec 04 '16 at 12:51

5 Answers5

11

I would use token_get_all to parse the document and map a registered random string replacement on all interesting tokens.

To obfuscate all the variable names, replace T_VARIABLE in one pass, ignoring all the superglobals.

Additionally, for the bounty's requisite function names, replace all the T_FUNCTION declarations in the first pass. Then a second pass is needed to replace all the T_STRING invocations because PHP allows you to use a function before it's declared.

For this example, I generated all lowercase letters to avoid case-insensitive clashes to function names, but you can obviously use whatever characters you want and add an extra conditional check for increased complexity. Just remember that they can't start with a number.

I also registered all the internal function names with get_defined_functions to protect against the extremely off-chance possibility that a randomly generated string would match one of those function names. Keep in mind this won't protect against special extensions installed on the machine running the obfuscated script that are not present on the server obfuscating the script. The chances of that are astronomical, but you can always ratchet up the length of the randomly generated string to diminish those odds even more.

<?php

$tokens = token_get_all(file_get_contents('example.php'));

$globals = array(
    '$GLOBALS',
    '$_SERVER',
    '$_GET',
    '$_POST',
    '$_FILES',
    '$_COOKIE',
    '$_SESSION',
    '$_REQUEST',
    '$_ENV',
);

// prevent name clashes with randomly generated strings and native functions
$registry = get_defined_functions();
$registry = $registry['internal'];

// first pass to change all the variable names and function name declarations
foreach($tokens as $key => $element){
    // make sure it's an interesting token
    if(!is_array($element)){
        continue;
    }
    switch ($element[0]) {
        case T_FUNCTION:
            $prefix = '';
            // this jumps over the whitespace to get the function name
            $index = $key + 2;
            break;

        case T_VARIABLE:
            // ignore the superglobals
            if(in_array($element[1], $globals)){
                continue 2;
            }
            $prefix = '$';
            $index = $key;
            break;

        default:
            continue 2;
    }

    // check to see if we've already registered it
    if(!isset($registry[$tokens[$index][1]])){
        // make sure our random string hasn't already been generated
        // or just so crazily happens to be the same name as an internal function
        do {
            $replacement = $prefix.random_str(16);
        } while(in_array($replacement, $registry));

        // map the original and register the replacement
        $registry[$tokens[$index][1]] = $replacement;
    }

    // rename the variable
    $tokens[$index][1] = $registry[$tokens[$index][1]];
}

// second pass to rename all the function invocations
$tokens = array_map(function($element) use ($registry){
    // check to see if it's a function identifier
    if(is_array($element) && $element[0] === T_STRING){
        // make sure it's one of our registered function names
        if(isset($registry[$element[1]])){
            // rename the variable
            $element[1] = $registry[$element[1]];
        }
    }
    return $element;
},$tokens);

// dump the tokens back out to rebuild the page with obfuscated names
foreach($tokens as $token){
    echo $token[1] ?? $token;
}

/**
 * https://stackoverflow.com/a/31107425/4233593
 * Generate a random string, using a cryptographically secure
 * pseudorandom number generator (random_int)
 *
 * For PHP 7, random_int is a PHP core function
 * For PHP 5.x, depends on https://github.com/paragonie/random_compat
 *
 * @param int $length      How many characters do we want?
 * @param string $keyspace A string of all possible characters
 *                         to select from
 * @return string
 */
function random_str($length, $keyspace = 'abcdefghijklmnopqrstuvwxyz')
{
    $str = '';
    $max = mb_strlen($keyspace, '8bit') - 1;
    for ($i = 0; $i < $length; ++$i) {
        $str .= $keyspace[random_int(0, $max)];
    }
    return $str;
}

Given this example.php

<?php

$example = 'some $string';

if(isset($_POST['something'])){
  echo $_POST['something'];
}

function exampleFunction($variable2){
  echo $variable2;
}

exampleFunction($example);

$variable3 = array('example','another');

foreach($variable3 as $key => $var3val){
  echo $var3val."somestring";
}

Produces this output:

<?php

$vsodjbobqokkaabv = 'some $string';

if(isset($_POST['something'])){
  echo $_POST['something'];
}

function gkfadicwputpvroj($zwnjrxupprkbudlr){
  echo $zwnjrxupprkbudlr;
}

gkfadicwputpvroj($vsodjbobqokkaabv);

$vfjzehtvmzzurxor = array('example','another');

foreach($vfjzehtvmzzurxor as $riuqtlravsenpspv => $mkdgtnpxaqziqkgo){
  echo $mkdgtnpxaqziqkgo."somestring";
}
Community
  • 1
  • 1
Jeff Puckett
  • 37,464
  • 17
  • 118
  • 167
  • Thanks a lot for your help. I can certainly make use of "token_get_all" to improve the code I already have, but what about dynamically generated function names and variable names? Can they be detected with "token_get_all"? At the moment, that's a huge security risk. – Dan Bray Dec 02 '16 at 03:31
  • 1
    @DanBray by dynamic, do you mean [variable functions](http://php.net/manual/en/functions.variable-functions.php) and [variable variables](http://php.net/manual/en/language.variables.variable.php)? If so, then no, I don't think this will work as is, but the parse approach is definitely preferable to a regex, so this code should be extensible for that. But as per your security concern, I think you should also be worried about [closures](http://php.net/manual/en/class.closure.php). – Jeff Puckett Dec 03 '16 at 02:20
  • Puckett Yes, precisely. For security reasons, I would like reject any code that contains "variable functions" and "variables variables". However, the "Closure" class, should already be unavailable because I have not white-listed "Closure". – Dan Bray Dec 03 '16 at 14:28
  • @DanBray well I mean it's not like anyone ever instantiates `new Closure` so it's the [anonymous functions](http://php.net/manual/en/functions.anonymous.php) that you need to watch out for. – Jeff Puckett Dec 04 '16 at 01:11
7

EDIT 4.12.2016 - please see below! (after first answer)

I've just tried to find a solution which can handle both cases: your given case and this example from Elias Van Ootegerm.

of course it should be improved as mentioned in one of my comments, but it works for your example:

$source = file_get_contents("source.php");

// this should get all Variables BUT isn't right at the moment if a variable is followed by an ' or " !!
preg_match_all('/\$[\$a-zA-Z0-9\[\'.*\'\]]*/', $source, $matches);
$matches = array_unique($matches[0]);

// this array saves all old and new variable names to track all replacements
$replacements = array();
$obfuscated_source = $source;
foreach($matches as $varName)
{
    do // generates random string and tests if it already is used by an earlier replaced variable name
    {
        // generate a random string -> should be improved.
        $randomName = substr(md5(rand()), 0, 7);
        // ensure that first part of variable name is a character.
        // there could also be a random character...
        $randomName = "a" . $randomName;
    }
    while(in_array("$" . $randomName, $replacements));

    if(substr($varName, 0,8) == '$GLOBALS')
    {
        // this handles the case of GLOBALS variables
        $delimiter = substr($varName, 9, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$GLOBALS[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,8) == '$_SERVER')
    {
        // this handles the case of SERVER variables
        $delimiter = substr($varName, 9, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_SERVER[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,5) == '$_GET')
    {
        // this handles the case of GET variables
        $delimiter = substr($varName, 6, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_GET[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,6) == '$_POST')
    {
        // this handles the case of POST variables
        $delimiter = substr($varName, 7, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_POST[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,7) == '$_FILES')
    {
        // this handles the case of FILES variables
        $delimiter = substr($varName, 8, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_FILES[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,9) == '$_REQUEST')
    {
        // this handles the case of REQUEST variables
        $delimiter = substr($varName, 10, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_REQUEST[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,9) == '$_SESSION')
    {
        // this handles the case of SESSION variables
        $delimiter = substr($varName, 10, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_SESSION[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,5) == '$_ENV')
    {
        // this handles the case of ENV variables
        $delimiter = substr($varName, 6, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_ENV[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 0,8) == '$_COOKIE')
    {
        // this handles the case of COOKIE variables
        $delimiter = substr($varName, 9, 1);
        if($delimiter == '$') $delimiter = '';
        $newName = '$_COOKIE[' .$delimiter . $randomName . $delimiter . ']'; 
    }
    else if(substr($varName, 1, 1) == '$')
    {
        // this handles the case of variable variables
        $name = substr($varName, 2, strlen($varName)-2);
        $pattern = '/(?=\$)\$' . $name . '.*;/';
        preg_match_all($pattern, $source, $varDeclaration);
        $varDeclaration = $varDeclaration[0][0];

        preg_match('/\s*=\s*["\'](?:\\.|[^"\\]])*["\']/', $varDeclaration, $varContent);
        $varContent = $varContent[0];

        preg_match('/["\'](?:\\.|[^"\\]])*["\']/', $varContent, $varContentDetail);
        $varContentDetail = substr($varContentDetail[0], 1, strlen($varContentDetail[0])-2);

        $replacementDetail = str_replace($varContent, substr($replacements["$" . $varContentDetail], 1, strlen($replacements["$" . $varContentDetail])-1), $varContent);

        $explode = explode($varContentDetail, $varContent);
        $replacement = $explode[0] . $replacementDetail . $explode[1];
        $obfuscated_source = str_replace($varContent, $replacement, $obfuscated_source);
    }
    else
    {
        $newName = '$' . $randomName;   
    }

    $obfuscated_source = str_replace($varName, $newName, $obfuscated_source);

    $replacements[$varName] = $newName;
}

// this part may be useful to change hard-coded returns of functions.
// it changes all remaining words in the document which are like the previous changed variable names to the new variable names
// attention: if the variables in the document have common names it could also change text you don't like to change...
foreach($replacements as $before => $after)
{
    $name_before = str_replace("$", "", $before);
    $name_after = str_replace("$", "", $after);
    $obfuscated_source = str_replace($name_before, $name_after, $obfuscated_source);
}

// here you can place code to write back the obfuscated code to the same or to a new file, e.g:
$file = fopen("result.php", "w");
fwrite($file, $obfuscated_source);
fclose($file);

EDIT there are still some cases left which require some effort. At least some kinds of variable declarations may not be handled correctly!

Also the first regex is not perfect, my current status is like: '/\$\$?[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/' but this does not get the index-values of predefined variables... But I think it has some potential. If you use it like here you get all 18 involved variables... The next step could be to determine if a [..] follws after the variable name. If so any predefined variable AND such cases like $g = $GLOBALS; and any further use of such a $g would be covered...


EDIT 4.12.2016

due to LSerni and several comments on both the original quesion and some solutions I also wrote a parsing solution which you can find below. It handles an extended example file which was my aim. If you find any other challenge, please tell me!

new solution:

 $variable_names_before = array();
 $variable_names_after  = array();
 $function_names_before = array();
 $function_names_after  = array();
 $forbidden_variables = array(
    '$GLOBALS',
    '$_SERVER',
    '$_GET',
    '$_POST',
    '$_FILES',
    '$_COOKIE',
    '$_SESSION',
    '$_REQUEST',
    '$_ENV',
 );
 $forbidden_functions = array(
     'unlink'
 );

 // read file
 $data = file_get_contents("example.php");

 $lock = false;
 $lock_quote = '';
 for($i = 0; $i < strlen($data); $i++)
 {
     // check if there are quotation marks
     if(($data[$i] == "'" || $data[$i] == '"'))
     {
         // if first quote
         if($lock_quote == '')
         {
             // remember quotation mark
             $lock_quote = $data[$i];
             $lock = true;
         }
         else if($data[$i] == $lock_quote)
         {
             $lock_quote = '';
             $lock = false;
         }
     }

     // detect variables
     if(!$lock && $data[$i] == '$')
     {
         $start = $i;
         // detect variable variable names
         if($data[$i+1] == '$')
         {
             $start++;
             // increment $i to avoid second detection of variable variable as "normal variable"
             $i++;
         }

         $end = 1;
         // find end of variable name
         while(ctype_alpha($data[$start+$end]) || is_numeric($data[$start+$end]) || $data[$start+$end] == "_")
         {
             $end++;
         }
         // extract variable name
         $variable_name = substr($data, $start, $end);
         if($variable_name == '$')
         {
             continue;
         }
         // check if variable name is allowed
         if(in_array($variable_name, $forbidden_variables))
         {
             // forbidden variable deteced, do whatever you want!
         }
         else
         {
             // check if variable name already has been detected
             if(!in_array($variable_name, $variable_names_before))
             {
                 $variable_names_before[] = $variable_name;
                 // generate random name for variable
                 $new_variable_name = "";
                 do
                 {
                     $new_variable_name = random_str(rand(5, 20));
                 }
                 while(in_array($new_variable_name, $variable_names_after));
                 $variable_names_after[] = $new_variable_name;
             }
             //var_dump("variable: " . $variable_name);
         }
     }

     // detect function-definitions
     // the third condition checks if the symbol before 'function' is neither a character nor a number
     if(!$lock && strtolower(substr($data, $i, 8)) == 'function' && (!ctype_alpha($data[$i-1]) && !is_numeric($data[$i-1])))
     {
         // find end of function name
         $end = strpos($data, '(', $i);
         // extract function name and remove possible spaces on the right side
         $function_name = rtrim(substr($data, ($i+9), $end-$i-9));
         // check if function name is allowed
         if(in_array($function_name, $forbidden_functions))
         {
             // forbidden function detected, do whatever you want!
         }
         else
         {
             // check if function name already has been deteced
             if(!in_array($function_name, $function_names_before))
             {
                 $function_names_before[] = $function_name;
                 // generate random name for variable
                 $new_function_name = "";
                 do
                 {
                     $new_function_name = random_str(rand(5, 20));
                 }
                 while(in_array($new_function_name, $function_names_after));
                 $function_names_after[] = $new_function_name;
             }
             //var_dump("function: " . $function_name);
         }
     }
 }

// this array contains prefixes and suffixes for string literals which
// may contain variable names.
// if string literals as a return of functions should not be changed
// remove the last two inner arrays of $possible_pre_suffixes
// this will enable correct handling of situations like
// - $func = 'getNewName'; echo $func();
// but it will break variable variable names like
// - ${getNewName()}
$possible_pre_suffixes = array(
    array(
        "prefix" => "= '",
        "suffix" => "'"
    ),
    array(
        "prefix" => '= "',
        "suffix" => '"'
    ),
    array(
        "prefix" => "='",
        "suffix" => "'"
    ),
    array(
        "prefix" => '="',
        "suffix" => '"'
    ),
    array(
        "prefix" => 'rn "', // return " ";
        "suffix" => '"'
    ),
    array(
        "prefix" => "rn '", // return ' ';
        "suffix" => "'"
    )
);
// replace variable names
for($i = 0; $i < count($variable_names_before); $i++)
{
    $data = str_replace($variable_names_before[$i], '$' . $variable_names_after[$i], $data);

    // try to find strings which equals variable names
    // this is an attempt to handle situations like:
    // $a = "123";
    // $b = "a";    <--
    // $$b = "321"; <--

    // and also
    // function getName() { return "a"; }
    // echo ${getName()};
    $name = substr($variable_names_before[$i], 1);
    for($j = 0; $j < count($possible_pre_suffixes); $j++)
    {
        $data = str_replace($possible_pre_suffixes[$j]["prefix"] . $name . $possible_pre_suffixes[$j]["suffix"],
                            $possible_pre_suffixes[$j]["prefix"] . $variable_names_after[$i] . $possible_pre_suffixes[$j]["suffix"],
                            $data);
    }
}
// replace funciton names
for($i = 0; $i < count($function_names_before); $i++)
{
    $data = str_replace($function_names_before[$i], $function_names_after[$i], $data);
}

/**
 * https://stackoverflow.com/a/31107425/4233593
 * Generate a random string, using a cryptographically secure
 * pseudorandom number generator (random_int)
 *
 * For PHP 7, random_int is a PHP core function
 * For PHP 5.x, depends on https://github.com/paragonie/random_compat
 *
 * @param int $length      How many characters do we want?
 * @param string $keyspace A string of all possible characters
 *                         to select from
 * @return string
 */
function random_str($length, $keyspace = 'abcdefghijklmnopqrstuvwxyz')
{
    $str = '';
    $max = mb_strlen($keyspace, '8bit') - 1;
    for ($i = 0; $i < $length; ++$i)
    {
        $str .= $keyspace[random_int(0, $max)];
    }
    return $str;
}

example input file:

$example = 'some $string';
$test = '$abc 123' . $example . '$hello here I "$am"';

if(isset($_POST['something'])){
  echo $_POST['something'];
}

function exampleFunction($variable2){
  echo $variable2;
}

exampleFunction($example);

$variable3 = array('example','another');

foreach($variable3 as $key => $var3val){
  echo $var3val."somestring";
}

$test = "example";
$$test = 'hello';

exampleFunction($example);
exampleFunction($$test);

function getNewName()
{
    return "test";
}
exampleFunction(${getNewName()});

output of my function:

$fesvffyn = 'some $string';
$zimskk = '$abc 123' . $fesvffyn . '$hello here I "$am"';

if(isset($_POST['something'])){
  echo $_POST['something'];
}

function kainbtqpybl($yxjvlvmyfskwqcevo){
  echo $yxjvlvmyfskwqcevo;
}

kainbtqpybl($fesvffyn);

$lmiphctfgjfdnonjpia = array('example','another');

foreach($lmiphctfgjfdnonjpia as $qypdfcpcla => $gwlpcpnvnhbvbyflr){
  echo $gwlpcpnvnhbvbyflr."somestring";
}

$zimskk = "fesvffyn";
$$zimskk = 'hello';

kainbtqpybl($fesvffyn);
kainbtqpybl($$zimskk);

function tauevjkk()
{
    return "zimskk";
}
kainbtqpybl(${tauevjkk()});

I know there are some cases left, where you can find an issue with variable variable names, but then you may have to expand the $possible_pre_suffixes array...

Maybe you also want to differentiate between global variables and "forbidden variables"...

Community
  • 1
  • 1
SaschaP
  • 883
  • 1
  • 7
  • 25
  • 2
    ouch `substr(md5(rand()), 0, 7)` <-- hash collision risk is quite large with this one – Elias Van Ootegem Sep 04 '15 at 13:35
  • this part was just an attempt for any other random string generator. – SaschaP Sep 04 '15 at 13:36
  • Sorry mate, but it doesn't work: _"PHP Parse error: syntax error, unexpected '33221' (T_LNUMBER), expecting variable (T_VARIABLE) or '$' in /home/elias/out.php on line 10"_, and running the script produces a lot of warnings – Elias Van Ootegem Sep 04 '15 at 13:40
  • [The result](http://pastebin.com/1Rv9GE1d): First the warnings, then the output, then the source.php file, then a the code you've posted with which I got the output – Elias Van Ootegem Sep 04 '15 at 13:45
  • Either way, +1 for effort, but it must be said, your approach completely breaks down the moment someone writes: `$g = $GLOBALS;` and uses `$g` as shorthand for the super-global. You really need parsing, which regex's won't be able to do – Elias Van Ootegem Sep 04 '15 at 13:47
  • 2 notices. The one you're working on (undefined variable: foobar, because of the function) alongside with the `var_dump` at the end – Elias Van Ootegem Sep 04 '15 at 13:51
  • @EliasVanOotegem yes, indeed my approach is by far not perfect! It could just be an assistance for the OP. If I will have some time in the next days I will try to improve the code the consider more styles how php code can be written. – SaschaP Sep 04 '15 at 13:52
  • Also: work on your regex's a bit: `(=| =| = )` can be written as `\s*=\s*`, and this one: `'/\$[\$a-zA-Z0-9\[\'.*\'\]]*/'` is flawed, too: `'/\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*\[[\'"]|\$[^]\'"]+[\'"]?\]]*/'` makes more sense (the alternative regex might not work, I just wrote it off the top of my head...) – Elias Van Ootegem Sep 04 '15 at 13:54
  • Your code also obfuscates strings in single quotes: `$format = '$stuff -> ...';` will change the string that's being assigned to the variable, which it shouldn't do. It _should_ change it if double quotes are used **and** the `$` sign isn't being escaped... and if that weren't enough, there's the possibility of `eval` being used to access the variables you're actually obfuscating :P – Elias Van Ootegem Sep 04 '15 at 14:15
  • @EliasVanOotegem at least now I'm starting to stumble a bit :-) Your regexp gave me a good start but the second part to handle the squared brackets seems to be wrong. My regex-answer could solve the problem of the OP but I agree your first rejection of a general regex-solution! As reflected in the last paragraph of my answer I possibly see a way to improve the code, but I'm sure you will find code snippets which would require exponentially more code ;-) Nevertheless I'm interested to build a class out of it, if I would have more spare time! – SaschaP Sep 04 '15 at 21:02
  • @SaschaP , you obviously have a remarkable talent. I'd love to see it invested in a *parsing* solution. I've nothing against regex's, but what was said re: HTML in http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 , applies to PHP **in spades**. Please, don't go there :-) – LSerni Dec 01 '16 at 17:13
  • @LSerni thanks! I'm really a regex beginner and wrote my answer with Elias' help to show that disputing regex in general is wrong. I know this answer you pasted the link to, it's great ;-) So I will try my best creating a parsing solution, but It will be very hard after Jeff Puckett II's answer, which is excellent in my eyes (besides the fact it can't handle variable variable names). – SaschaP Dec 04 '16 at 10:59
  • This changes string literals for all instances that match a variable name, so it would break any instance of string usage that coincidentally matched one of those names. – Jeff Puckett Dec 04 '16 at 15:47
  • @JeffPuckettII it changes the string literal on purpose to echo the new name of the variable. If this is not desired just remove the last two inner arrays of `$possible_pre_suffixes`. This would also enable variable functions like `$func = 'getNewName'; echo $func();`, but it will break variable variable names like `${getNewName()}` – SaschaP Dec 04 '16 at 15:50
  • Exactly, it's a slippery tradeoff. There's probably a better solution. – Jeff Puckett Dec 04 '16 at 15:52
  • The parsing solution you wrote is excellent. I can modify it to fit my needs. I will still need to use "token_get_all" to detect functions that are being used that have not been declared, and to detect commands being used that I have not white-listed. As for variable functions and variable variables, I do not need to rename them, just detect them and display an error message to the client. – Dan Bray Dec 04 '16 at 16:27
  • @DanBray to detect functions which are used without being declared you could check my array `$function_names_before` against [PHP's get_defined_functions()](http://php.net/manual/de/function.get-defined-functions.php). Do you need any further help? – SaschaP Dec 04 '16 at 16:32
  • I don't think I need any more help. If I do, I will let you know shortly. First I am made some changes to your code, then it will be time to award the bounty. Thank you very much for you help. – Dan Bray Dec 04 '16 at 17:20
  • I'm having to rewrite the parser from scratch because it's more efficient to loop through the data once, and to rename variables and functions straight away, instead of by storing them in an array and renaming them afterwards. Instead of using "str_replace" or a similar function, I just add the modified code to a blank variable. – Dan Bray Dec 05 '16 at 15:24
6

Well, you can try write your own but the number of strange things you have to handle are likely to overwhelm you, and I presume you are more interested in using such a tool than writing and maintaining one yourself. (There a lots of broken PHP obfuscators out there, where people have tried to do this).

If you want one that is reliable, you do have base it on a parser or your tool will mis-parse the text and handle it wrong (this is the first "strange thing"). Regexes simply won't do the trick.

The Semantic Designs PHP Obfuscator (from my company), taken out of the box, took this slightly modified version of Elias Van Ootegem's example:

 <?php

//non-obfuscated

function getVarname()
{//the return value has to change
return (('foobar'));
}

$format = '%s = %d';
$foobar = 123;

$variableVar = (('format'));//you need to change this string

printf($$variableVar, $variableVar = getVarname(), $$variableVar);

echo PHP_EOL;

var_dump($GLOBALS[(('foobar'))]);//note the key == the var

and produced this:

<?php function l0() { return (('O0')); } $l1="%\163 = %d"; $O1=0173; $l2=(('O2')); printf($$l2,$l2=l0(),$$l2); echo PHP_EOL; var_dump($GLOBALS[(('O0'))]);

The key issue in Elias's example are strings that actually contain variable names. In general, there is no way for a tool to know that "x" is a variable name, and not just the string containing the letter x. But, the programmers know. We insist that such strings be marked [by enclosing them in ((..)) ] and then the obfuscator can obfuscate their content properly. Sometimes the string contains variables names and other things; it that case, the programmer has to break up the string into "variable name" content and everything else. This is pretty easy to do in practice, and is the "slight change" I made to his supplied code. Other strings, not being marked, are left alone. You only have to do this once to the source file. [You can say this is cheating, but no other practical answer will work; the tool cannot know reliably. Halting Problem, if you insist.].

The next thing to get right is reliable obfuscation across multiple files. You can't do this one file at a time. This obfuscator has been used on very big PHP applications (thousands of PHP script files).

Yes, it does use a full PHP parser. Not nikic's.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Upvoting a lot of very good points, from "don't roll your own" to the 'halting problem' stuff. Pity I'm only allowed one upvote. – LSerni Dec 01 '16 at 17:06
2

I ended up with this simple code:

$tokens = token_get_all($src);
$skip = array('$this','$_GET','$_POST','$_REQUEST','$_SERVER','$_COOKIE','$_SESSION');
function renameVars($tokens,$content,$skip){
  $vars = array();
  foreach($tokens as $token) {
      if ($token[0] == T_VARIABLE && !in_array($token[1],$skip))
          $vars[generateRandomString()]= $token[1];
  }
  $vars = array_unique($vars);
  $vars2 = $vars;

  foreach($vars as $new => $old){
    foreach($vars2 as $var){
      if($old!=$var && strpos($var,$old)!==false){
        continue 2;
      }
    }  
    $content = str_replace($old,'${"'.$new.'"}',$content);
    //function(${"example"}) will trigger error. This is why we need this:
    $content = str_replace('(${"'.$new.'"}','($'.$new,$content);
    $content = str_replace(',${"'.$new.'"}',',$'.$new,$content);
    $chars = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
    //for things like function deleteExpired(Varien_Event_Observer $fz5eDWIt1si), Exception, 
    foreach($chars as $char){
      $content = str_replace($char.' ${"'.$new.'"}',$char.' $'.$new,$content);
    }           
} 

It works for me because the code is simple. I guess it wont work in all scenarios.

JohnyFree
  • 1,319
  • 3
  • 22
  • 35
0

I have it working now but there may still be some vulnerabilities because PHP allows functions names and variables names to be generated dynamically.

The first function replaces $_SESSION, $_POST etc. with functions:

function replaceArrayVariable($str, $arr, $function)
{
    $str = str_replace($arr, $function, $str);
    $lastPos = 0;

    while (($lastPos = strpos($str, $function, $lastPos)) !== false)
    {
        $lastPos = $lastPos + strlen($function);
        $currentPos = $lastPos;
        $openSqrBrackets = 1;
        while ($openSqrBrackets > 0)
        {
            if ($str[$currentPos] === '[')
                 $openSqrBrackets++;
            elseif ($str[$currentPos] === ']')
                 $openSqrBrackets--;
            $currentPos++;
        }
        $str[$currentPos - 1] = ')';
    }
    return $str;
}

The second renames functions ignoring whitelisted keywords:

function renameFunctions($str)
{
    preg_match_all('/[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/', $str, $matches, PREG_OFFSET_CAPTURE);
    $totalMatches = count($matches[0]);
    $offset = 0;
    for ($i = 0; $i < $totalMatches; $i++)
    {
        $matchIndex = $matches[0][$i][1] + $offset;
        if ($matchIndex === 0 || $str[$matchIndex - 1] !== '$')
        {
            $keyword = $matches[0][$i][0];
            if ($keyword !== 'true' && $keyword !== 'false' && $keyword !== 'if' && $keyword !== 'else' && $keyword !== 'getPost' && $keyword !== 'getSession')
            {
                $str = substr_replace($str, 'qq', $matchIndex, 0);
                $offset += 2;
            }
        }
    }
    return $str;
}

Then to rename functions, variables, and non-whitelisted keywords, I use this code:

$str = replaceArrayVariable($str, '$_POST[', 'getPost(');
$str = replaceArrayVariable($str, '$_SESSION[', 'getSession(');
preg_match_all('/\'(?:\\\\.|[^\\\\\'])*\'|.[^\']+/', $str, $matches);
$str = '';
foreach ($matches[0] as $match)
{
    if ($match[0] != "'")
    {
        $match = preg_replace('!\s+!', ' ', $match);
        $match = renameFunctions($match);
        $match = str_replace('$', '$qq', $match);
    }
    $str .= $match;
}
halfer
  • 19,824
  • 17
  • 99
  • 186
Dan Bray
  • 7,242
  • 3
  • 52
  • 70
  • 1
    Nobody can make your code 100% secure; a determined theif can always undo encryption/obfuscation (consider the Chinese government as an adversary). But what you asked for in the bounty request was a tool that could rename "variables, functions, commands". You can't rename commands; "goto" has to stay "goto". You can rename the variables and functions. Our PHP Obfuscator does this, I claim reliably. Are you claiming it doesn't work? DId you look at the example obfuscation in the answer? Did you try it? What specifically doesn't it do? – Ira Baxter Dec 01 '16 at 14:18
  • The php code doesn't need to be encrypted or obfuscated to make it unreadable. It simply needs to be safe and secure to execute. I would have explained that in the question, but the last edit I made was rejected. I can rename commands. If I rename "goto" to "qqgoto", then the "goto" command has been disabled and cannot be used by the client. Commands such as, "unlink" could wipe everything from the server, so I will replace it with a function that checks if the client has permission to delete the file. – Dan Bray Dec 01 '16 at 15:15
  • The code that I have in my answer works fine, but I am unsure whether modifying it to use "token_get_all" would make it more efficient. I certainly need to prevent dynamically generated functions and variables because it's essential that the client cannot access functions and variables that are outside of the framework that I am making. – Dan Bray Dec 01 '16 at 15:20
  • I don't understand what you mean by renaming "goto" to "qqgoto". If you do that *in your code*, the PHP engine will reject your code when it encounters a "qqgoto" statement in your code and now your program has stopped working altogether. That hardly seems useful. If you want to remove "goto" from the PHP engine's capability, you can do that by building a custom implementation of PHP, but nobody will agree to put it on their server, so that that hardly seems useful. If you want to *prevent* people from using goto or dynamically generated variables, you don't want an obfuscator... – Ira Baxter Dec 01 '16 at 16:15
  • ... you want a static analysis tool that scans your code and complains when it finds a construct you don't like. (Such tools are generally called "style checkers" and have nothing to do with the question title here "renaming variables"). Even if you have such a checking tool, what is to stop a programmer from taking code that passes the style check, and then simply inserting a construct you don't like? What exactly do you want to do? "Make the code unreadable" or "prevent certain coding styles"? I'm beginning to think you have an XY problem. Look it up. – Ira Baxter Dec 01 '16 at 16:18
  • I don't care whether the code is readable or not because nobody will ever see it, nor do I care about restricting certain coding styles. I simply want to force the client to use the framework I am making, so that a client cannot write any malicious code. Unless I decide to allow "goto" statements as part of the framework, then the desired result is for any code that contains "goto" statements to not work. Later, I will improve upon that by throwing specific error messages to the client. The client doesn't upload any code, they enter it onscreen into a Codemirror window. – Dan Bray Dec 01 '16 at 17:02
  • 2
    If you can force the client to submit the code to before it is used, then you can run a "stylechecker" to check for constructs you don't like. But that makes your bonus on this question completely wrongheaded, because this question is about renaming variables, not stylechecking. – Ira Baxter Dec 01 '16 at 17:10
  • 1
    To meet your stated requirement, you need a *stylechecker*. Your real problem will be getting one that checks the style rules you want. You might consider rolling your own from scratch using TOKEN_GET_ALL (or whatever you say to PHP to get all the tokens of a file) but that way lies madness or least a probable failure to build a serious style checker. See http://www.semdesigns.com/Products/DMS/LifeAfterParsing.html – Ira Baxter Dec 01 '16 at 17:12