8

I have researched a lot on this topic but could not get anything substantial. By normalize/canonicalize I mean to remove all the "..", ".", multiple slashes etc from a file path and get a simple absolute path. e.g.

"/rootdir/dir1/dir2/dir3/../././././dir4//////////" to "/rootdir/dir1/dir2/dir4"

On windows I have GetFullPathName() and I can get the canonical filepath name, but for Linux I cannot find any such API which can do the same work for me, realpath() is there, but even realpath() needs the filepath to be present on the file system to be able to output normalized path, e.g. if the path /rootdir/dir1/dir2/dir4 is not on file system - realpath() will throw error on the above specified complex filepath input. Is there any way by which one could get the normalized file path even if it is not existing on the file system?

Yogesh
  • 565
  • 3
  • 21
  • Just a side note: If `dir3` is a symlink to another directory, your normalization does not work as you expected it in the example. – dhke Oct 12 '15 at 10:36
  • Thanks @dhke , I understand that - but for my use case I will have directories on file system so that is not a worry for me. – Yogesh Oct 12 '15 at 10:41
  • Possible duplicate of [Getting absolute path of a file](http://stackoverflow.com/questions/229012/getting-absolute-path-of-a-file) – RedX Oct 12 '15 at 11:00
  • But beware that neither GetFullPathname nor realpath are truly canonical as they don't respect symlinks and similar. – RedX Oct 12 '15 at 11:01
  • @RedX Thanks for mentioning, I am aware of this fact. My usecase is just to get a normalized path for any complex path as I mentioned. Before posting this question I went through the link that you have posted in your comment, but my problem seems not be solved with that and it could have been misleading continuing in that post so I posted it here with a proper subject, – Yogesh Oct 12 '15 at 11:25
  • But the real question of course is: does a non-existent file *have* a path? And in the case of canonical: could it have *two* paths? – joop Oct 12 '15 at 11:32
  • @joop I understand your query, my use case is to check if an input filepath is contained inside an existing preconfigured directory. But both the paths of preconfigured directory and input filepath can be very complexly written. To check that I need to simplify the paths. If the filepath is found to be inside the preconfigured path then I have to create the file. I think now I am clear on why I need to have an absolute filepath for a nonexisting file. A filepath can be represented in multiple ways lilke /home/dir1/dir2/file.txt OR /home/dir1/dir2/../dir2/file.txt – Yogesh Oct 12 '15 at 11:45
  • 1
    Well in that case: remove nodes, starting from the end, until an existing path is found. (this is essentially what `mkdir -p /some/path` does ...) BTW: what should happen if two directory entries link to the same inode? – joop Oct 12 '15 at 11:47
  • @joop Thanks for your replies, what exactly you mean by removing nodes? sorry if the question looks very naive but i am in learning phase. Are you trying to point out something related to INODE table? if yes, can you please add a bit more how to achieve that. – Yogesh Oct 12 '15 at 11:51
  • No, just in your path string: remove the last slash (replace it with a '\0') and anything after it, and see if the shortened path exists. rinse, repeat ... – joop Oct 12 '15 at 11:52
  • @joop I understand it can be done that way e.g. just parse through the file path, I could have used a stack and directory names can be stored in it starting from left to right in the path, encountering each ".." the stack can be popped and at the end recreate the path using the stack's content. But then the code could be really buggy I was hoping there should be some or the other solution already to check the containment of one path in another path. – Yogesh Oct 12 '15 at 12:03
  • Possible duplicate of [How to get absolute path of file or directory, that does \*not\* exist?](https://stackoverflow.com/questions/11034002/how-to-get-absolute-path-of-file-or-directory-that-does-not-exist) – Jan Rüegg May 26 '17 at 13:34

2 Answers2

4

realpath(3) does not resolve missing filenames.
But GNU core utilities (https://www.gnu.org/software/coreutils/) have a program realpath(1) which is similar to realpath(3) function, but have option:
-m, --canonicalize-missing no components of the path need exist
And your task can be done by canonicalize_filename_mode() function from file lib/canonicalize.c of the coreutils source.

y_ug
  • 904
  • 6
  • 8
  • can you add a bit more like how can I use it in my code. I have been trying to build on a linux machine, but cannot include canonicalize.h – Yogesh Oct 13 '15 at 10:17
3

canonicalize_filename_mode() from Gnulib is a great option but cannot be used in commercial software (GPL License)

We use the following implementation that depends on cwalk library:

#define _GNU_SOURCE

#include <unistd.h>
#include <stdlib.h>

#include "cwalk.h"

/* extended version of canonicalize_file_name(3) that can handle non existing paths*/
static char *canonicalize_file_name_missing(const char *path) {
    char *resolved_path = canonicalize_file_name(path);
    if (resolved_path != NULL) {
        return resolved_path;
    }
    /* handle missing files*/
    char *cwd = get_current_dir_name();
    if (cwd == NULL) {
        /* cannot detect current working directory */
        return NULL;
    }
    size_t resolved_path_len = cwk_path_get_absolute(cwd, path, NULL, 0);
    if (resolved_path_len == 0) {
        return NULL;
    }
    resolved_path = malloc(resolved_path_len + 1);
    cwk_path_get_absolute(cwd, path, resolved_path, resolved_path_len + 1);
    free(cwd);
    return resolved_path;
}
  • [cwk_path_normalize](https://likle.github.io/cwalk/reference/cwk_path_normalize.html) works as well if it doesn't have to be absolute – Julius Mar 03 '19 at 15:22