13

I already read about realpath(), but is there a function that I can pass a base directory and a filename that would give me the following result without resolving symlinks or checking whether files actually exist? Or do I have to use a modified realpath()?

"/var/", "../etc///././/passwd" => "/etc/passwd"
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
thejh
  • 44,854
  • 16
  • 96
  • 107
  • 4
    what should be the result of "/dir/a_random_synlink/../hello" ? remember that it may not be the same thing as "/dir/hello" if a_random_synlink doesn't point to a directory in the same directory – BatchyX Jan 23 '11 at 15:46
  • @BatchyX: Seems to be standard behaviour: `readlink -v -m '/home/user/linktoslashtmp/../'` returns `/home/user` – thejh Jan 23 '11 at 16:09
  • 1
    maybe readlink does this, but the underlying OS does not. ls /home/user/linktoslashtmp/../ lists the content of / – BatchyX Jan 23 '11 at 16:20
  • @BatchyX is correct, performing this "normalisation" will mean that the before and after paths do not necessarily open the same file anymore. – caf Jan 23 '11 at 23:59
  • @BatchyX You have apparently buggy tools. `ls` works for me in the same way as `readlink`. You have to check sources of `readlink`. – 0andriy Dec 10 '15 at 19:55

3 Answers3

10

Here is a normalize_path() function:

If the given path is relative, the function starts by prepending the current working directory to it.

Then the special path components like .., . or empty components are treated, and the result is returned.

For .., the last component is removed if there is one (/.. will just return /).
For . or empty components (double /), this is just skipped.

The function ensures to not return empty an path (/ is returned instead).

#define _GNU_SOURCE /* memrchr() */

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <limits.h>

char * normalize_path(const char * src, size_t src_len) {

        char * res;
        size_t res_len;

        const char * ptr = src;
        const char * end = &src[src_len];
        const char * next;

        if (src_len == 0 || src[0] != '/') {

                // relative path

                char pwd[PATH_MAX];
                size_t pwd_len;

                if (getcwd(pwd, sizeof(pwd)) == NULL) {
                        return NULL;
                }

                pwd_len = strlen(pwd);
                res = malloc(pwd_len + 1 + src_len + 1);
                memcpy(res, pwd, pwd_len);
                res_len = pwd_len;
        } else {
                res = malloc((src_len > 0 ? src_len : 1) + 1);
                res_len = 0;
        }

        for (ptr = src; ptr < end; ptr=next+1) {
                size_t len;
                next = memchr(ptr, '/', end-ptr);
                if (next == NULL) {
                        next = end;
                }
                len = next-ptr;
                switch(len) {
                case 2:
                        if (ptr[0] == '.' && ptr[1] == '.') {
                                const char * slash = memrchr(res, '/', res_len);
                                if (slash != NULL) {
                                        res_len = slash - res;
                                }
                                continue;
                        }
                        break;
                case 1:
                        if (ptr[0] == '.') {
                                continue;

                        }
                        break;
                case 0:
                        continue;
                }
                res[res_len++] = '/';
                memcpy(&res[res_len], ptr, len);
                res_len += len;
        }

        if (res_len == 0) {
                res[res_len++] = '/';
        }
        res[res_len] = '\0';
        return res;
}
Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
  • +1: That seems to work well for the case where the path is evaluated relative to the current directory. Strictly, I think the interpretation of the question is "evaluate the path `../etc///././passwd` relative to `/var/`", which is a simple variation on your theme (you don't need to establish the current directory with `getcwd()`; you use the value passed by the user). – Jonathan Leffler Jan 23 '11 at 16:02
  • Thanks, looks good - I modified the function a little bit to accept a pwd parameter. – thejh Jan 23 '11 at 17:44
  • 2
    Sure, I give you the permission – Arnaud Le Blanc Jan 23 '11 at 18:00
  • @user576875 Can you give me the permission to use it in a GPL-licensed project if I write that the function was written by you? – thejh Jan 23 '11 at 18:00
  • 2
    thanks! I've modified it to be independent from system calls and fixed to correctly handle situations where cwd == '/' : https://gist.github.com/Eugeny/5127791 – Eugene Pankov Mar 10 '13 at 09:32
3
function normalize_path($path, $pwd = '/') {
        if (!isset($path[0]) || $path[0] !== '/') {
                $result = explode('/', getcwd());
        } else {
                $result = array('');
        }
        $parts = explode('/', $path);
        foreach($parts as $part) {
            if ($part === '' || $part == '.') {
                    continue;
            } if ($part == '..') {
                    array_pop($result);
            } else {
                    $result[] = $part;
            }
        }
        return implode('/', $result);
}

(The question was tagged PHP at the time I wrote this.)

Anyway, here is a regex version:

function normalize_path($path, $pwd = '/') {
        if (!isset($path[0]) || $path[0] !== '/') {
                $path = "$pwd/$path";
        }
        return preg_replace('~
                ^(?P>sdotdot)?(?:(?P>sdot)*/\.\.)*
                |(?<sdotdot>(?:(?P>sdot)*/(?!\.\.)(?:[^/]+)(?P>sdotdot)?(?P>sdot)*/\.\.)+)
                |(?<sdot>/\.?(?=/|$))+
        ~sx', '', $path);
}
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
  • Yes, it was tagged with no language, somebody put "php" on it and I changed it to "c" - sorry for forgetting that tag. – thejh Jan 23 '11 at 13:52
  • @user576875 @thejh My bad (I tagged it as PHP). Should have checked your recent questions first. Apologies to all. – John Parker Jan 23 '11 at 13:55
1

I use Hardex's solution:

#include <string.h>

char * normalizePath(char* pwd, const char * src, char* res) {
    size_t res_len;
    size_t src_len = strlen(src);

    const char * ptr = src;
    const char * end = &src[src_len];
    const char * next;

    if (src_len == 0 || src[0] != '/') {
        // relative path
        size_t pwd_len;

        pwd_len = strlen(pwd);
        memcpy(res, pwd, pwd_len);
        res_len = pwd_len;
    } else {
        res_len = 0;
    }

    for (ptr = src; ptr < end; ptr=next+1) {
        size_t len;
        next = (char*)memchr(ptr, '/', end-ptr);
        if (next == NULL) {
            next = end;
        }
        len = next-ptr;
        switch(len) {
        case 2:
            if (ptr[0] == '.' && ptr[1] == '.') {
                const char * slash = (char*)memrchr(res, '/', res_len);
                if (slash != NULL) {
                    res_len = slash - res;
                }
                continue;
            }
            break;
        case 1:
            if (ptr[0] == '.') {
                continue;
            }
            break;
        case 0:
            continue;
        }

        if (res_len != 1)
            res[res_len++] = '/';

        memcpy(&res[res_len], ptr, len);
        res_len += len;
    }

    if (res_len == 0) {
        res[res_len++] = '/';
    }
    res[res_len] = '\0';
    return res;
}

Example:

#include <stdio.h>

int main(){
    char path[FILENAME_MAX+1];
    printf("\n%s\n",normalizePath((char*)"/usr/share/local/apps",(char*)"./../../../",path));
    return 0;
}

Output:

/usr


Note:
  1. The first argument is the directory path (absolute path) relative to which other paths will be normalized. It is generally the absolute path of the current directory.
  2. The second argument is the string to be normalized without resolving symlinks.
  3. The third argument is a char* which must have the required memory/capacity to contain the normalized path.
Community
  • 1
  • 1
Jahid
  • 21,542
  • 10
  • 90
  • 108