38

We are using boost::filesystem in our application. I have a 'full' path that is constructed by concatenating several paths together:

#include <boost/filesystem/operations.hpp>
#include <iostream>
     
namespace bf = boost::filesystem;

int main()
{
    bf::path root("c:\\some\\deep\\application\\folder");
    bf::path subdir("..\\configuration\\instance");
    bf::path cfgfile("..\\instance\\myfile.cfg");

    bf::path final ( root / subdir / cfgfile);

    cout << final.file_string();
}

The final path is printed as:

c:\some\deep\application\folder\..\configuration\instance\..\instance\myfile.cfg

This is a valid path, but when I display it to the user I'd prefer it to be normalized. (Note: I'm not even sure if "normalized" is the correct word for this). Like this:

c:\some\deep\application\configuration\instance\myfile.cfg

Earlier versions of Boost had a normalize() function - but it seems to have been deprecated and removed (without any explanation).

Is there a reason I should not use the BOOST_FILESYSTEM_NO_DEPRECATED macro? Is there an alternative way to do this with the Boost Filesystem library? Or should I write code to directly manipulating the path as a string?

Mike Willekes
  • 5,960
  • 10
  • 33
  • 33

6 Answers6

31

Boost v1.48 and above

You can use boost::filesystem::canonical:

path canonical(const path& p, const path& base = current_path());
path canonical(const path& p, system::error_code& ec);
path canonical(const path& p, const path& base, system::error_code& ec);

http://www.boost.org/doc/libs/1_48_0/libs/filesystem/v3/doc/reference.html#canonical

v1.48 and above also provide the boost::filesystem::read_symlink function for resolving symbolic links.

Boost versions prior to v1.48

As mentioned in other answers, you can't normalise because boost::filesystem can't follow symbolic links. However, you can write a function that normalises "as much as possible" (assuming "." and ".." are treated normally) because boost offers the ability to determine whether or not a file is a symbolic link.

That is to say, if the parent of the ".." is a symbolic link then you have to retain it, otherwise it is probably safe to drop it and it's probably always safe to remove ".".

It's similar to manipulating the actual string, but slightly more elegant.

boost::filesystem::path resolve(
    const boost::filesystem::path& p,
    const boost::filesystem::path& base = boost::filesystem::current_path())
{
    boost::filesystem::path abs_p = boost::filesystem::absolute(p,base);
    boost::filesystem::path result;
    for(boost::filesystem::path::iterator it=abs_p.begin();
        it!=abs_p.end();
        ++it)
    {
        if(*it == "..")
        {
            // /a/b/.. is not necessarily /a if b is a symbolic link
            if(boost::filesystem::is_symlink(result) )
                result /= *it;
            // /a/b/../.. is not /a/b/.. under most circumstances
            // We can end up with ..s in our result because of symbolic links
            else if(result.filename() == "..")
                result /= *it;
            // Otherwise it should be safe to resolve the parent
            else
                result = result.parent_path();
        }
        else if(*it == ".")
        {
            // Ignore
        }
        else
        {
            // Just cat other path entries
            result /= *it;
        }
    }
    return result;
}
einpoklum
  • 118,144
  • 57
  • 340
  • 684
Adam Bowen
  • 10,820
  • 6
  • 36
  • 41
  • 2
    Note that `boost::filesystem::canonicalize()` requires a path, that actually exists in the filesystem. So you cannot use it to normalize a path, that may point to a filesystem item that *currently* does not exist, such as a path on a removable medium or disconnected network drive. In these cases the function will report an error. Compare with [`boost::filesystem::path::lexically_normal`](https://www.boost.org/doc/libs/release/libs/filesystem/doc/reference.html#lexically_normal) – zett42 Jun 20 '19 at 18:08
  • Note that `canonical` has problems with Windows links and junctions, at least as of Boost 1.72. See https://github.com/boostorg/filesystem/issues Same for `weakly_canonical` and `read_symlink` – Evgen May 02 '20 at 00:18
19

With version 3 of boost::filesystem you can also try to remove all the symbolic links with a call to canonical. This can be done only for existing paths so a function that also works for non-existing ones would require two steps (tested on MacOS Lion and updated for Windows thanks to @void.pointer's comment):

boost::filesystem::path normalize(const boost::filesystem::path &path) {
    boost::filesystem::path absPath = absolute(path);
    boost::filesystem::path::iterator it = absPath.begin();
    boost::filesystem::path result = *it++;

    // Get canonical version of the existing part
    for (; exists(result / *it) && it != absPath.end(); ++it) {
        result /= *it;
    }
    result = canonical(result);

    // For the rest remove ".." and "." in a path with no symlinks
    for (; it != absPath.end(); ++it) {
        // Just move back on ../
        if (*it == "..") {
            result = result.parent_path();
        }
        // Ignore "."
        else if (*it != ".") {
            // Just cat other path entries
            result /= *it;
        }
    }

    // Make sure the dir separators are correct even on Windows
    return result.make_preferred();
}
jarzec
  • 459
  • 4
  • 8
  • 1
    Sorry, a `++` was missing in line 4 above. – jarzec Oct 09 '12 at 14:30
  • 6
    `canonical` works only for existing files. I needed something that also works for non-existing paths (`canonical` is used by `normalize` for the existing bit of the path). – jarzec Apr 11 '13 at 19:13
  • This doesn't work right on Windows. If I pass in `"E:\\foo\\.\\bar"`, I get back `"E:/foo\\bar"`. The slashes are inconsistent. Change the `return` expression to `return result.make_preferred()` and it fixes the issue. Now I get `"E:\\foo\\bar"`. – void.pointer Aug 29 '19 at 17:41
  • @void.pointer Thanks a lot. I ever had chance to test this on Windows. – jarzec Sep 04 '19 at 21:10
  • Typo in "make_prefered()" in the example. Also note that canonical has problems with Windows links and junctions, at least as of Boost 1.72. See https://github.com/boostorg/filesystem/issues – Evgen May 02 '20 at 00:13
  • @Evgen Thanks. I fixed the typo. – jarzec May 03 '20 at 18:49
14

Your complaints and/or wishes about canonical have been addressed by Boost 1.60 [1] with

path lexically_normal(const path& p);
Alexander Shukaev
  • 16,674
  • 8
  • 70
  • 85
7

the explanation is at http://www.boost.org/doc/libs/1_40_0/libs/filesystem/doc/design.htm :

Work within the realities described below.

Rationale: This isn't a research project. The need is for something that works on today's platforms, including some of the embedded operating systems with limited file systems. Because of the emphasis on portability, such a library would be much more useful if standardized. That means being able to work with a much wider range of platforms that just Unix or Windows and their clones.

where the "reality" applicable to removal of normalize is:

Symbolic links cause canonical and normal form of some paths to represent different files or directories. For example, given the directory hierarchy /a/b/c, with a symbolic link in /a named x pointing to b/c, then under POSIX Pathname Resolution rules a path of "/a/x/.." should resolve to "/a/b". If "/a/x/.." were first normalized to "/a", it would resolve incorrectly. (Case supplied by Walter Landry.)

the library cannot really normalize a path without access to the underlying filesystems, which makes the operation a) unreliable b) unpredictable c) wrong d) all of the above

Community
  • 1
  • 1
just somebody
  • 18,602
  • 6
  • 51
  • 60
  • 1
    I think wanting to normalize the path is sane, natural, and expected behaviour. Looks like they have over-thought this one and erred on the side of wrong. – Kieveli Nov 17 '09 at 03:27
  • 3
    Boost.Filesystem aiming at inclusion in the C++ standard, which is why they removed the features that are useful on *some* of the platforms. there's already a de-facto *and* de-iure standard for the feature you're longing, its realpath() in POSIX: The realpath() function shall derive, from the pathname pointed to by file_name, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links. % cd /home/foo/tmp % ln -s foo .. % echo $PWD/foo/.. /home/foo/tmp/foo/.. % realpath $PWD/foo/.. /home/foo – just somebody Nov 17 '09 at 04:06
  • This part of symbolic links always bugged me, that's quite a violation of the Principle of Least Astonishment :/ – Matthieu M. Nov 17 '09 at 13:49
  • which part? AFAICS "this part" is the whole point of symlinks, no? – just somebody Nov 17 '09 at 14:28
  • At the very least the macro to re-enable this functionality should have been called BOOST_FILESYSTEM_NOT_NECESSARILY_PORTABLE (or something like that). Calling the code 'deprecated' makes one think that it could be dropped from a future release. – Mike Willekes Nov 18 '09 at 21:42
  • 1
    Sucks majorly, interesting to claim "this isn't a research project" and then pretty much directly after come up with an excuse which leads everyone to believe that it is. Surely a better solution would've been to just implement it in terms of for example realpath() on posix, and whatever is needed on windows, and then on unsupported platforms throw an exception? – Ylisar Mar 22 '11 at 14:09
  • not sure why this answer has gotten a downvote as it's a copy/paste straight from the horse's mouth. – just somebody Sep 17 '12 at 13:27
  • This doesn't really answer the question. – zett42 Jun 20 '19 at 18:41
3

It's still there. Keep using it.

I imagine they deprecated it because symbolic links mean that the collapsed path isn't necessarily equivalent. If c:\full\path were a symlink to c:\rough, then c:\full\path\.. would be c:\, not c:\full.

Jonathan Graehl
  • 9,182
  • 36
  • 40
0

Since the "canonical" function works only with paths that exist, I made my own solution that splits the path to its parts, and compares every part with the next one. I'm using this with Boost 1.55.

typedef boost::filesystem::path PathType;

template <template <typename T, typename = std::allocator<T> > class Container>
Container<PathType> SplitPath(const PathType& path)
{
    Container<PathType> ret;
    long the_size = std::distance(path.begin(),path.end());
    if(the_size == 0)
        return Container<PathType>();
    ret.resize(the_size);
    std::copy(path.begin(),path.end(),ret.begin());
    return ret;
}

PathType NormalizePath(const PathType& path)
{
    PathType ret;
    std::list<PathType> splitPath = SplitPath<std::list>(path);
    for(std::list<PathType>::iterator it = (path.is_absolute() ? ++splitPath.begin() : splitPath.begin()); it != splitPath.end(); ++it)
    {
        std::list<PathType>::iterator it_next = it;
        ++it_next;
        if(it_next == splitPath.end())
            break;
        if(*it_next == "..")
        {
            it = splitPath.erase(it);
            it = splitPath.erase(it);
        }
    }
    for(std::list<PathType>::iterator it = splitPath.begin(); it != splitPath.end(); ++it)
    {
        ret /= *it;
    }
    return ret;
}

To use this, here's an example on how you call it:

std::cout<<NormalizePath("/home/../home/thatfile/")<<std::endl;
The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189