3

I'm writing some code to manage a custom on disk file structure and syncronize it to unconnected systems. One of my requirements is to be able to estimate the size of a sync prior to actually generating the sync content. As a simple solution, I've put together a map with full path filenames as the key for effecient lookup of already scanned content.

I run into problems with this when I have multiple files in my file structure referenced from different places in different ways. For example:

C:\DataSource\files\samplefile.txt
C:\DataSource\data\samples\..\..\files\samplefile.txt
C:\DataSource\etc\..\files\samplefile.txt

These 3 path strings all reference the same file on-disk, however their string representation is different. If I drop these into a map verbatim, I'll count the size of samplefile.txt 3 times, and my estimate will be wrong.

In an attempt to find a way around this, I was hoping boost::filesystem::path provided a function to reduce or simplify a path, but I didn't see anything of the sort. Using the path decomposition table and path iterators, I wrote up the following function (for use in a Windows environment):

std::string ReducePath( std::string Path )
{
    bfs::path input( Path );
    bfs::path result( "" );
    bfs::path::iterator it, endIt;
    for( it = input.begin( ), endIt = input.end( ); it != endIt; it ++ )
    {
        if( (*it) == ".." )
        {
            // Remove the leaf directory.
            result = result.parent_path( );
        }
        else if( (*it) == "." )
        {
            // Just ignore.
        }
        else
        {
            // Append the element to the end of the current result.
            result /= (*it);
        }
    }

    return result.string( ).c_str( );
}

I have two questions.

One, is there a standard function that provides this sort of functionality, or does this already exist in boost or another library somewhere?

Two, I'm not entirely confident that the function I wrote will work in all cases, and I'd like some more eyes on it. It works in my tests. Does anyone see a case where it'll break down?

Perculator
  • 1,293
  • 1
  • 10
  • 12
  • The call this creating a "canonical" path to a file. Please change your title and your question to use the more common buzzword. – S.Lott Sep 23 '09 at 16:55
  • One scenario where this might produce a resulting path that doesn't necessarily refer to the same file system object as the original path is if a path component that gets removed as a result of being followed by a double-dot is actually a link to another directory. Imagine if in your third example, *C:\DataSource\etc* were actually a symbolic link to *D:\tmp\someDir*. If you were to simplify *C:\DataSource\etc\..\files\samplefile.txt* to *C:\DataSource\files\samplefile.txt*, that might not actually refer to the same file. But you could always check for that via boost::filesystem::is_symlink. – antred Jun 01 '15 at 14:19
  • Nevermind, I only just realized this question was asked 6 years ago and that boost::filesystem now has a _canonical()_ function it probably didn't have back then. – antred Jun 01 '15 at 14:35

2 Answers2

2

There is a function in boost

bool equivalent(const Path1& p1, const Path2& p2);

That checks if two paths are equal. That would be ideal except that there is no equivalent < operator(and perhaps cannot be).

Does anyone see a case where this will break ...

Maybe; if you have input like "../test.txt", parent path might not do what you want. I would recommend completing the path first.

See "complete" in the filesystem library.

Good luck --Robert Nelson

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
1

While not an exact dupe, this question will help: Best way to determine if two path reference to same file in Windows?

Community
  • 1
  • 1
daveb
  • 74,111
  • 6
  • 45
  • 51