25

I'm trying to enforce a root directory in a filesystem abstraction. The problem I'm encountering is the following:

The API lets you read and write files, not only to local but also remote storages. So there's all kinds of normalisation going on under the hood. At the moment it doesn't support relative paths, so something like this isn't possible:

$filesystem->write('path/to/some/../relative/file.txt', 'file contents');

I want to be able to securely resolve the path so the output is would be: path/to/relative/file.txt. As is stated in a github issue which was created for this bug/enhancement (https://github.com/FrenkyNet/Flysystem/issues/36#issuecomment-30319406) , it needs to do more that just splitting up segments and removing them accordingly.

Also, since the package handles remote filesystems and non-existing files, realpath is out of the question.

So, how should one go about when dealing with these paths?

Frank de Jonge
  • 1,607
  • 1
  • 15
  • 23
  • 1
    How about `realpath(dirname($path))` ? – nice ass Dec 11 '13 at 15:13
  • 5
    realpath needs a path to exist on the local filesystem, which is not the case for writes and totally not usable on remote filesystems – Frank de Jonge Dec 11 '13 at 15:15
  • 2
    I don't see how can you determine the absolute path of a non-existant relative path. You need for at least the subpath that includes the dots to exist – nice ass Dec 11 '13 at 15:20
  • Not exactly, you could also replace all the `../` which has another leasing segment with a [empty-string], but that has security risks, as I mentioned in the github issue. – Frank de Jonge Dec 11 '13 at 17:27

4 Answers4

12

To quote Jame Zawinski:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

protected function getAbsoluteFilename($filename) {
  $path = [];
  foreach(explode('/', $filename) as $part) {
    // ignore parts that have no value
    if (empty($part) || $part === '.') continue;

    if ($part !== '..') {
      // cool, we found a new part
      array_push($path, $part);
    }
    else if (count($path) > 0) {
      // going back up? sure
      array_pop($path);
    } else {
      // now, here we don't like
      throw new \Exception('Climbing above the root is not permitted.');
    }
  }

  // prepend my root directory
  array_unshift($path, $this->getPath());

  return join('/', $path);
}
Beat Christen
  • 307
  • 3
  • 5
  • 2
    A few comments: (1) Using `empty()` is dangerous because it will skip directories with the name `0` or `0.0`. (2) You should probably use `DIRECTORY_SEPARATOR` instead of `/`. (3) This works fine also on arbitrary paths that are not filenames, so it should probably be called `getAbsolutePath`. – jlh Mar 29 '19 at 08:06
7

I've resolved how to do this, this is my solution:

/**
 * Normalize path
 *
 * @param   string  $path
 * @param   string  $separator
 * @return  string  normalized path
 */
public function normalizePath($path, $separator = '\\/')
{
    // Remove any kind of funky unicode whitespace
    $normalized = preg_replace('#\p{C}+|^\./#u', '', $path);

    // Path remove self referring paths ("/./").
    $normalized = preg_replace('#/\.(?=/)|^\./|\./$#', '', $normalized);

    // Regex for resolving relative paths
    $regex = '#\/*[^/\.]+/\.\.#Uu';

    while (preg_match($regex, $normalized)) {
        $normalized = preg_replace($regex, '', $normalized);
    }

    if (preg_match('#/\.{2}|\.{2}/#', $normalized)) {
        throw new LogicException('Path is outside of the defined root, path: [' . $path . '], resolved: [' . $normalized . ']');
    }

    return trim($normalized, $separator);
}
Frank de Jonge
  • 1,607
  • 1
  • 15
  • 23
  • "Path remove self referring paths ("/./")" does not work properly with paths ending with `..` e.g. `a/b/..`. Also, the "Regex for resolving relative paths" does not work properly with dirs with a dot prefix e.g. `a/.b/../c`. – bernie Dec 02 '14 at 22:11
  • 1
    I've improved upon this initial implementation after posting this, which deals with those situations too, the code can be found here: https://github.com/thephpleague/flysystem/blob/master/src/Util.php#L80 – Frank de Jonge Dec 05 '14 at 18:57
  • If you edit the answer, I can remove my -1. Btw, any reason why you use regexes instead of splitting on dir separators and looping on the path parts, keeping a stack of path parts, popping the last path part when you encounter `..`? – bernie Dec 06 '14 at 01:09
  • Can this be relied upon to replace `realpath` in preventing directory traversal attacks, as per http://stackoverflow.com/a/4205278/2970321 ? – alexw Apr 25 '16 at 05:16
1

./ current location

../ one level up

function normalize_path($str){
    $N = 0;
    $A =explode("/",preg_replace("/\/\.\//",'/',$str));  // remove current_location
    $B=[];
    for($i = sizeof($A)-1;$i>=0;--$i){
        if(trim($A[$i]) ===".."){
            $N++;
        }else{
            if($N>0){
                $N--;
            }
            else{
                $B[] = $A[$i];
            }
        }
    }
    return implode("/",array_reverse($B));
}

so:

"a/b/c/../../d" -> "a/d"
 "a/./b" -> "a/b"
Community
  • 1
  • 1
bortunac
  • 4,642
  • 1
  • 32
  • 21
-1
/**
 * Remove '.' and '..' path parts and make path absolute without
 * resolving symlinks.
 *
 * Examples:
 *
 *   resolvePath("test/./me/../now/", false);
 *   => test/now
 *   
 *   resolvePath("test///.///me///../now/", true);
 *   => /home/example/test/now
 *   
 *   resolvePath("test/./me/../now/", "/www/example.com");
 *   => /www/example.com/test/now
 *   
 *   resolvePath("/test/./me/../now/", "/www/example.com");
 *   => /test/now
 *
 * @access public
 * @param string $path
 * @param mixed $basePath resolve paths realtively to this path. Params:
 *                        STRING: prefix with this path;
 *                        TRUE: use current dir;
 *                        FALSE: keep relative (default)
 * @return string resolved path
 */
function resolvePath($path, $basePath=false) {
    // Make absolute path
    if (substr($path, 0, 1) !== DIRECTORY_SEPARATOR) {
        if ($basePath === true) {
            // Get PWD first to avoid getcwd() resolving symlinks if in symlinked folder
            $path=(getenv('PWD') ?: getcwd()).DIRECTORY_SEPARATOR.$path;
        } elseif (strlen($basePath)) {
            $path=$basePath.DIRECTORY_SEPARATOR.$path;
        }
    }

    // Resolve '.' and '..'
    $components=array();
    foreach(explode(DIRECTORY_SEPARATOR, rtrim($path, DIRECTORY_SEPARATOR)) as $name) {
        if ($name === '..') {
            array_pop($components);
        } elseif ($name !== '.' && !(count($components) && $name === '')) {
            // … && !(count($components) && $name === '') - we want to keep initial '/' for abs paths
            $components[]=$name;
        }
    }

    return implode(DIRECTORY_SEPARATOR, $components);
}
  • 2
    Some explanation alongside your code would be very helpful. – Graham Jun 01 '17 at 15:57
  • What kind of explanation did you mean? Examples? Or just emphasize the first comment line "Remove '.' and '..' path parts and make path absolute without resolving symlinks."? – Jan Filein Jun 03 '17 at 09:19
  • Take a look around SO and you'll see plenty of examples of this. This isn't just a code factory, but a place where people come to learn. There is a reason the system flagged your answer to longer-term users and asked for the community to come and help introduce you to the way things work here. – Graham Jun 03 '17 at 13:13
  • Look, I tried to be helpful. I don't want to waste time beating around the bush so be specific and answer my question. Answer of type "search on SO for the answer" is not an answer and it is not even helpful if one does not know what to search for. I explained the code in the code's comment and this code is an answer to original problem/question. So I ask again. What other "explanation" do you mean? Examples? More explanation? I am a pro that looks at the code and everything is clear. So if you are a newbie, tell me what part is unclear and I will try to update it with more explanation. – Jan Filein Jun 03 '17 at 16:41
  • 1
    Really man? That tour says I did everything all right. Sorry, but we are going in circles. Your complaints without explanations are just wasting my time. Go through referenced http://stackoverflow.com/tour and stick with the first rule "This site is all about getting answers. It's not a discussion forum. There's no chit-chat.". Thanks at least for trying to improve my first contribution but it was not helpful. Have a nice day. – Jan Filein Jun 04 '17 at 11:30
  • @JanFilein Hey! Thanks for your answer. I think what they are getting at is that while you have included a comment at the top of your function, it might be better as regular text. Perhaps explain how it's used, and what the arguments it takes do? – starbeamrainbowlabs Aug 22 '19 at 18:11