11

What is the best way extract the common file path from the list of file path strings in c#?

Eg: I have a list 5 file paths in List variable, like below

c:\abc\pqr\tmp\sample\b.txt
c:\abc\pqr\tmp\new2\c1.txt
c:\abc\pqr\tmp\b2.txt
c:\abc\pqr\tmp\b3.txt
c:\abc\pqr\tmp\tmp2\b2.txt

output should be c:\abc\pqr\tmp

Mahender
  • 5,554
  • 7
  • 38
  • 54
  • Is the brute force method not useful for some reason? – Matt Mills Dec 20 '11 at 15:54
  • Do you need to consider relative and non-normalized (i.e. C:\abc\..\abc\pqr\tmp), mapped (`SUBST`) or linked paths as well? – Christian.K Dec 20 '11 at 15:57
  • possible duplicate of [Find common prefix of strings](http://stackoverflow.com/questions/2070356/find-common-prefix-of-strings) – Fischermaen Dec 20 '11 at 16:01
  • Prefix is the key word here. Try this: http://stackoverflow.com/a/2070434/862973 hope this helps. – ldgorman Dec 20 '11 at 16:00
  • Take a look at this [SO post](http://stackoverflow.com/questions/2070356/find-common-prefix-of-strings), which solves the same problem in a number of different ways. For easy Googling, this is the "longest common substring" problem. – Michael Kingsmill Dec 20 '11 at 15:56
  • what about `c:\files\awesome.txt` and `c:\files\awful.txt`? Or what about `c:\a\files\awesome.txt` and `c:\b\files\awesome.txt`? Both of those cases will report the wrong values. – corsiKa Dec 20 '11 at 16:02

9 Answers9

17

Because everything is best solved with LINQ*:
*not everything is best solved with LINQ.

using System.Collections.Generic;
using System.IO;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        List<string> Files = new List<string>()
        {
            @"c:\abc\pqr\tmp\sample\b.txt",
            @"c:\abc\pqr\tmp\new2\c1.txt",
            @"c:\abc\pqr\tmp\b2.txt",
            @"c:\abc\pqr\tmp\b3.txt",
            @"c:\a.txt"
        };

        var MatchingChars =
            from len in Enumerable.Range(0, Files.Min(s => s.Length)).Reverse()
            let possibleMatch = Files.First().Substring(0, len)
            where Files.All(f => f.StartsWith(possibleMatch))
            select possibleMatch;

        var LongestDir = Path.GetDirectoryName(MatchingChars.First());
    }
}

Explanation:

The first line gets a list of lengths of possible matches to evaluate. We want the longest possibility first (so i reverse the enumeration which would be 0, 1, 2, 3; turning it into 3, 2, 1, 0).

I then get the string to match, which is simply a substring of the first entry of the given length.

I then filter the results, to ensure we only include possible matches that all files start with.

Finally, i return the first result, which will be the longest substring and call path.getdirectoryname to ensure if something has a few identical letters in the filenames it isn't included.

Community
  • 1
  • 1
George Duckett
  • 31,770
  • 9
  • 95
  • 162
  • 1
    I love the comment at the top! :) – corsiKa Dec 20 '11 at 16:06
  • Could you add a description of what each of the 3 lines in the LINQ code do? I'm a bit confused by the call to `Reverse()` on the first line especially. – Stuart Leyland-Cole Dec 20 '11 at 16:18
  • 7
    If there are two paths c:\oog\pong and c:\oog\pongle this will return c:\oog\pong, it should just return c:\oog\. It needs to check on a directory by directory basis rather than a character by character basis. – Mongus Pong Dec 20 '11 at 17:23
  • Clever idea though. Shouldn't be hard to fix the above. – Mongus Pong Dec 20 '11 at 17:24
  • @mongus, that's what the line with the path call does. – George Duckett Dec 20 '11 at 17:48
  • Ah I'll have to have a closer at that after the cheesy period is over! – Mongus Pong Dec 23 '11 at 20:42
  • Although this code seems to work for files, it's not so when it comes to arbitrary strings (in my case it's nested folder names on a mail server). For "Inbox/In-inbox/In-in-box" and "Inbox/In-inbox" pair it will find that their common part is "Inbox/In-inbo", not "Inbox/In-inbox". I fixed this by adding +1 when building the range: Enumerable.Range(0, Files.Min(s => s.Length) + 1) – Alex Nov 10 '16 at 17:16
  • The solution looks pretty elegant, but I don't like the fact that it creates at least a couple of arrays that depending on the size of the original array could be a bad design decision and pretty expensive. – Allan.C Mar 03 '17 at 17:12
  • _The first line gets a list of lengths of possible matches to evaluate_ Don't want to be petty, but the first line imports `System.Collections.Generic` ;) – Clijsters Dec 06 '17 at 13:07
3

I don't think there's a "trick"to get past using the brute force method to do this comparison, but you can optimize your solution somewhat:

Prepare:
Find the shortest string

Repeat:
See if all of the other strings contain it
If so, you're done
If not, remove one or more characters
Matt Mills
  • 8,692
  • 6
  • 40
  • 64
  • C:\WorldsLongestFileNameIsntTheShortestStringNecessarilyButItIsAtTheRoot.Wrong .. but not if you shorten it by one character and try again until it disappears so that works. oh. – CAD bloke Apr 19 '16 at 09:07
1

I would use a loop and on each string I'd split it with s.Split('\').

Then it iterate over the first element, and next element, saving them as I go.

When I find one that is different, I can return the last iteration's result.

string commonPath(string[] paths) {
    // this is a Java notation, I hope it's right in C# as well? Let me know!
    string[][] tokens = new string[paths.length][];

    for(int i = 0; i < paths.Length; i++) {
        tokens[i] = paths.Split('\\');
    }

    string path = "";

    for(int i = 0; i < tokens[0].Length; i++) {
        string current = tokens[0][i];
        for(int j = 1; j < tokens.Length; j++) {
            if(j >= tokens[i].Length) return path;
            if(current != tokens[i][j]) return path;
        }
        path = path + current + '\\';
    }
    return path; // shouldn't reach here, but possible on corner cases
}
corsiKa
  • 81,495
  • 25
  • 153
  • 204
1

Here is a quick implementation that just loops through the list of strings and then compares the characters at the beginning of the strings trimming from the original string:

List<string> list1 = new List<string>();
list1.Add(@"c:\abc\pqr\tmp\sample\b.txt");
list1.Add(@"c:\abc\pqr\tmp\new2\c1.txt");
list1.Add(@"c:\abc\pqr\tmp\b2.txt");
list1.Add(@"c:\abc\pqr\tmp\b3.txt");
list1.Add(@"c:\abc\pqr\tmp\tmp2\b2.txt");

string baseDir = "";
foreach (var item in list1)
{
    if (baseDir == "")
        baseDir = System.IO.Path.GetDirectoryName(item);
    else
    {
        int index = 0;
        string nextDir = System.IO.Path.GetDirectoryName(item);
        while (index< baseDir.Length && index<nextDir.Length && 
            baseDir[index] == nextDir[index])
        {
            index++;
        }
        baseDir = baseDir.Substring(0, index);
    }
}
MessageBox.Show(baseDir);
John Koerner
  • 37,428
  • 8
  • 84
  • 134
  • dammit, I just spent like 5 minutes coding, commenting and testing the same thing, only to find you here... still +1, because, you know, same idea :) – neeKo Dec 20 '11 at 16:11
  • This implementation has one drawback. If you have this source `c:\abc\pqr\tmp; c:\abc\pqr\tmb; c:\abc\pqr\tmc` Then the common path is like `c:\abc\pqr\tm`. – Martin Aug 23 '16 at 08:04
1

Keep a tally of each segment and keep going while all the paths start with the tally.

Implementation with captured variable and TakeWhile:

void Main()
{
    string[] paths = new[] { @"c:\abc\pqr\tmp\sample\b.txt",
                            @"c:\abc\pqr\tmp\new2\c1.txt",
                            @"c:\abc\pqr\tmp\b2.txt",
                            @"c:\abc\pqr\tmp\b3.txt",
                            @"c:\abc\pqr\tmp\tmp2\b2.txt"};
                            
    var test = new List<string>();
    var common = paths[0].Split('\\').TakeWhile ( segment => 
    {
        test.Add ( segment );
        return paths.All ( path => path.StartsWith ( String.Join ("\\", test )  + "\\") ) ;
    } );
 
    Console.WriteLine ( String.Join ("\\", common ) ); // c:\abc\pqr\tmp
}

Implementation without captured variable and Reduce:

void Main()
{
    string[] paths = new[] { @"c:\abc\pqr\tmp\sample\b.txt",
                            @"c:\abc\pqr\tmp\new2\c1.txt",
                            @"c:\abc\pqr\tmp\b2.txt",
                            @"c:\abc\pqr\tmp\b3.txt",
                            @"c:\abc\pqr\tmp\tmp2\b2.txt"};
                            
    var common = paths[0].Split('\\').Aggregate("", (test, segment) => {
        var prefix = test + segment + "\\";
        return paths.All( path => path.StartsWith ( prefix ) ) ? prefix : test;
    });

    Console.WriteLine( common ); // c:\abc\pqr\tmp\
}
TWiStErRob
  • 44,762
  • 26
  • 170
  • 254
Mongus Pong
  • 11,337
  • 9
  • 44
  • 72
  • Nice one with `TakeWhile`, a tiny optimization: if you do the join in advance, it doesn't have to be done for every path. – TWiStErRob May 21 '22 at 14:58
1

Use the first path as the iterator seed:

using System;
using System.Collections.Generic;
using System.IO;

namespace stackoverflow1
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            List<String> paths=new List<String>();
            paths.Add(@"c:\abc\pqr\tmp\sample\b.txt");
            paths.Add(@"c:\abc\pqr\tmp\new2\c1.txt");
            paths.Add(@"c:\abc\pqr\tmp\b2.txt");
            paths.Add(@"c:\abc\pqr\tmp\b3.txt");
            paths.Add(@"c:\abc\pqr\tmp\tmp2\b2.txt");

            Console.WriteLine("Found: "+ShortestCommonPath(paths));

        }

        private static String ShortestCommonPath(IList<String> list)
        {
            switch (list.Count)
            {
            case 0: return null;
            case 1: return list[0];
            default:
                String s=list[0];
                while (s.Length>0)
                {
                    bool ok=true;
                    for (int i=1;i<list.Count;i++)
                    {
                        if (!list[i].StartsWith(s))
                        {
                            ok=false;
                            int p=s.LastIndexOf(Path.DirectorySeparatorChar);
                            if (p<0) return "";
                            s=s.Substring(0,p);
                            break;
                        }
                    }
                    if (ok) break;
                }
                return s;
            }
        }

    }
}
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
0

The selected solution does not work properly if one of the paths is exactly the directory name that should be returned. I think there should be:

from len in Enumerable.Range(0, matchingNames.Min(s => s.Length) + 1).Reverse()

Ota Milink
  • 38
  • 1
  • 5
0

The top answer fails for identical paths like e.g.:

string str1 = @"c:\dir\dir1\dir2\dir3";
string str2 = @"c:\dir\dir1\dir2\dir3";

This is better: Find common prefix of strings

however

.TakeWhile(s => s.All(d => d == s.First()))

should be

.TakeWhile(s =>
      {
          var reference = s.First();
          return s.All(d => string.Equals(reference, d, StringComparison.OrdinalIgnoreCase));
      })
Dr.Gee
  • 59
  • 3
0

You could break the paths down into segments (i.e. split by the backslash), then build back a segment at a time and compare the results until you find the end of the match. I doubt that is the best way, but it would work.

Samuel Slade
  • 8,405
  • 6
  • 33
  • 55