14

I got a list of files and directories List<string> pathes. Now I'd like to calculate the deepest common branch every path is sharing with each other.

We can assume that they all share a common path, but this is unknown in the beginning.

Let's say I have the following three entries:

  • C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/
  • C:/Hello/Earth/Bla/Bla/Bla

This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.

Second example:

  • C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/

-> C:/Hello/World/This/Is/

How would you proceed? I tried to use string.split(@"/") and start with the first string and check if every part of this array is contained in the other strings. However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries. Is there any better solution available?

My current attempt would be something like the following (C# + LINQ):

    public string CalculateCommonPath(IEnumerable<string> paths)
    {
        int minSlash = int.MaxValue;
        string minPath = null;
        foreach (var path in paths)
        {
            int splits = path.Split('\\').Count();
            if (minSlash > splits)
            {
                minSlash = splits;
                minPath = path;
            }
        }

        if (minPath != null)
        {
            string[] splits = minPath.Split('\\');
            for (int i = 0; i < minSlash; i++)
            {
                if (paths.Any(x => !x.StartsWith(splits[i])))
                {
                    return i >= 0 ? splits.Take(i).ToString() : "";
                }
            }
        }
        return minPath;
    }
Frame91
  • 3,670
  • 8
  • 45
  • 89
  • 1
    http://stackoverflow.com/questions/8578110/how-to-extract-common-file-path-from-list-of-file-paths-in-c-sharp, http://stackoverflow.com/questions/2070356/find-common-prefix-of-strings - the answer is quite easy to find... – Vojtěch Dohnal Jul 21 '14 at 14:02

7 Answers7

10

A function to get the longest common prefix may look like this:

public static string GetLongestCommonPrefix(string[] s)
{
    int k = s[0].Length;
    for (int i = 1; i < s.Length; i++)
    {
        k = Math.Min(k, s[i].Length);
        for (int j = 0; j < k; j++)
            if (s[i][j] != s[0][j])
            {
                k = j;
                break;
            }
    }
    return s[0].Substring(0, k);
}

Then you may need to cut the prefix on the right hand. E.g. we want to return c:/dir instead of c:/dir/file for

c:/dir/file1
c:/dir/file2

You also may want to normalize the paths before processing. See Normalize directory names in C#.

Community
  • 1
  • 1
AlexD
  • 32,156
  • 3
  • 71
  • 65
  • Thanks, exactly what I was looking for, could you maybe update your answer and add the sourcecode to it so people can see the answer directly? I needed to call this website via smartphone as my company is blocking this site. – Frame91 Jul 21 '14 at 14:04
  • 3
    Might be worth converting the source to C# since this isn't Java related. – DGibbs Jul 21 '14 at 14:17
  • Gave it a try wit c:/dirOne/SomeOtherDir c:/dirOne/SoNotCommonDir and I get c:/DirOne/So which is wrong – ManniAT Oct 14 '18 at 18:50
  • @ManniAT I think I commented this case, see: " _Then you may need to cut the prefix on the right hand._ " and below. The function `GetLongestCommonPrefix` is for strings in general. – AlexD Oct 14 '18 at 23:10
7

I dont know whether this is the best performing solution (probably not), but it surely is very easy to implement.

  • Sort your list alphabetically
  • compare the first entry in that sorted list to the last in that list, character by character, and terminate when you find a difference (the value before the termination is the longest shared substring of both those strings)

Sample Fiddle

Sample code:

List<string> paths = new List<string>();

paths.Add(@"C:/Hello/World/This/Is/An/Example/Bla.cs");
paths.Add(@"C:/Hello/World/This/Is/Not/An/Example/");
paths.Add(@"C:/Hello/Earth/Bla/Bla/Bla");

List<string> sortedPaths = paths.OrderBy(s => s).ToList();

Console.WriteLine("Most common path here: {0}", sharedSubstring(sortedPaths[0], sortedPaths[sortedPaths.Count - 1]));

And that function of course:

public static string sharedSubstring(string string1, string string2)
{
    string ret = string.Empty;

    int index = 1;
    while (string1.Substring(0, index) == string2.Substring(0, index))
    {
        ret = string1.Substring(0, index);
        index++;
    }

    return ret;
} // returns an empty string if no common characters where found
DrCopyPaste
  • 4,023
  • 1
  • 22
  • 57
  • Fails when comparing only 2 folders with same first letter – Pakk Feb 05 '18 at 20:20
  • @Pakk can you demonstrate that with a fiddle or sample values? If I edit to only include `paths.Add(@"C:/Hello/World/This/Is/An/Example/Bla.cs");` and `paths.Add(@"C;/Hillo/World/This/Is/Not/An/Example/");` I get the expected result: Most common string here: `C` – DrCopyPaste Feb 06 '18 at 09:20
  • 1
    in hindsight - I see the function works properly just that I was thinking of a valid path instead of most common string as a result. Apologies (my example which does indeed show the most common substring: https://dotnetfiddle.net/bcxasj) – Pakk Feb 06 '18 at 15:04
2

First sort the list with the paths to inspect. Then you can split and compare the first and the last item - if they are same proceed to the next dimension until you find a difference.

So you just need to sort once and then inspect two items.

Alex
  • 5,240
  • 1
  • 31
  • 38
2

To return c:/dir for

c:/dir/file1
c:/dir/file2

I would code it this way:

public static string GetLongestCommonPrefix(params string[] s)
{
    return GetLongestCommonPrefix((ICollection<string>)s);
}

public static string GetLongestCommonPrefix(ICollection<string> paths)
{
    if (paths == null || paths.Count == 0)
        return null;


    if (paths.Count == 1)
        return paths.First();

    var allSplittedPaths = paths.Select(p => p.Split('\\')).ToList();

    var min = allSplittedPaths.Min(a => a.Length);
    var i = 0;
    for (i = 0; i < min; i++)
    {
        var reference = allSplittedPaths[0][i];
        if (allSplittedPaths.Any(a => !string.Equals(a[i], reference, StringComparison.OrdinalIgnoreCase)))
        {
            break;
        }
    }

    return string.Join("\\", allSplittedPaths[0].Take(i));
}

And here are some tests for it:

[TestMethod]
public void GetLongestCommonPrefixTest()
{
    var str1 = @"C:\dir\dir1\file1";
    var str2 = @"C:\dir\dir1\file2";
    var str3 = @"C:\dir\dir1\file3";
    var str4 = @"C:\dir\dir2\file3";
    var str5 = @"C:\dir\dir1\file1\file3";
    var str6 = @"C:\dir\dir1\file1\file3";


    var res = Utilities.GetLongestCommonPrefix(str1, str2, str3);

    Assert.AreEqual(@"C:\dir\dir1", res);

    var res2 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str4);

    Assert.AreEqual(@"C:\dir", res2);

    var res3 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str5);

    Assert.AreEqual(@"C:\dir\dir1", res3);

    var res4 = Utilities.GetLongestCommonPrefix(str5, str6);

    Assert.AreEqual(@"C:\dir\dir1\file1\file3", res4);

    var res5 = Utilities.GetLongestCommonPrefix(str5);

    Assert.AreEqual(str5, res5);

    var res6 = Utilities.GetLongestCommonPrefix();

    Assert.AreEqual(null, res6);

    var res7 = Utilities.GetLongestCommonPrefix(null);

    Assert.AreEqual(null, res7);
}
Dr.Gee
  • 59
  • 3
  • Ouh... I just realized you don't want to do this because it might be expensive. But to be honest... Do you still code on a machine with just 128kB of RAM? – Dr.Gee Aug 28 '18 at 13:48
1

I would iterate over each character in the first path, comparing it with every character in every path (except the first) in the collection of paths:

public string FindCommonPath(List<string> paths)
{
    string firstPath = paths[0];
    bool same = true;

    int i = 0;

    string commonPath = string.Empty;

    while (same && i < firstPath.Length)
    {
        for (int p = 1; p < paths.Count && same; p++)
        {
            same = firstPath[i] == paths[p][i];
        }

        if (same)
        {
            commonPath += firstPath[i];
        }
        i++;
    }

    return commonPath;
}

You could iterate through the list first to find the shortest path and possibly improve it slightly.

Andrew Whitaker
  • 124,656
  • 32
  • 289
  • 307
  • Thanks for your answer! As far as I can see this would also cost O (n) same as AlexDs solution. Do you see any performance difference? – Frame91 Jul 21 '14 at 14:06
  • I believe it performs the same. It's not *O(n)* though, it's really *O(n*m)* where *m* is the length of the shortest substring. The outer loop iterates *m* times and the inner loop iterates *n* times. – Andrew Whitaker Jul 21 '14 at 14:13
  • Yeah, it's O (n*m), first thought that you could reduce O (n * x) to O (n) but you're right, thanks! – Frame91 Jul 21 '14 at 14:22
1

The function that gives you the longest common directory path with best possible complexity:

private static string GetCommonPath(IEnumerable<string> files)
{
    // O(N, L) = N*L; N  - number of strings, L - string length

    // if the first and last path from alphabetic order matches, all paths in between match
    string first = null;//smallest string
    string last = null;//largest string

    var comparer = StringComparer.InvariantCultureIgnoreCase;
    // find smallest and largest string:
    foreach (var file in files.Where(p => !string.IsNullOrWhiteSpace(p)))
    {
        if (last == null || comparer.Compare(file, last) > 0)
        {
            last = file;
        }

        if (first == null || comparer.Compare(file, first) < 0)
        {
            first = file;
        }
    }

    if (first == null)
    {
        // the list is empty
        return string.Empty;
    }

    if (first.Length > last.Length)
    {
        // first should not be longer
        var tmp = first;
        first = last;
        last = tmp;
    }

    // get minimal length
    var count = first.Length;
    var found = string.Empty;

    const char dirChar = '\\';
    var sb = new StringBuilder(count);
    for (var idx = 0; idx < count; idx++)
    {
        var current = first[idx];
        var x = char.ToLowerInvariant(current);
        var y = char.ToLowerInvariant(last[idx]);

        if (x != y)
        {
            // first and last string character is different - break
            return found;
        }

        sb.Append(current);

        if (current == dirChar)
        {
            // end of dir character
            found = sb.ToString();
        }
    }

    if (last.Length >= count && last[count] == dirChar)
    {
        // whole first is common root:
        return first;
    }

    return found;
}
  • I'm not sure what you consider "best possible complexity" and how would you prove it? If it's about computational complexity (not code complexity / readability), I'd say that your idea would probably be easily improved by ditching the `StringBuilder` and just holding on to the last found index (and computing and returning a sub-string up to that index only once at the end). Another doubt would be around finding the first and last string, as it hides a lot of operations behind the `comparer.Compare`. (And you are using the expensive InvariantCultureIgnoreCase instead of e.g. OrdinalIgnoreCase.) – Hilarion Jul 29 '21 at 17:52
0

This is considerably more optimized than splitting paths by slash and comparing them:

private static string FindCommonPath(string[] paths) {
    var firstPath = paths[0];
    var commonPathLength = firstPath.Length;

    for (int i = 1; i < paths.Length; i++)
    {
        var otherPath = paths[i];
        var pos = -1;
        var checkpoint = -1;

        while (true)
        {
            pos++;

            if (pos == commonPathLength)
            {
                if (pos == otherPath.Length
                    || (pos < otherPath.Length
                        && (otherPath[pos] == '/' || otherPath[pos] == '\\')))
                {
                    checkpoint = pos;
                }
                break;
            }

            if (pos == otherPath.Length)
            {
                if (pos == commonPathLength
                || (pos < commonPathLength
                    && (firstPath[pos] == '/' || firstPath[pos] == '\\')))
                {
                    checkpoint = pos;
                }
                break;
            }

            if ((firstPath[pos] == '/' || firstPath[pos] == '\\')
                && (otherPath[pos] == '/' || otherPath[pos] == '\\'))
            {
                checkpoint = pos;
                continue;
            }

            var a = char.ToLowerInvariant(firstPath[pos]);
            var b = char.ToLowerInvariant(otherPath[pos]);

            if (a != b)
                break;
        }

        if (checkpoint == 0 && (firstPath[0] == '/' || firstPath[0] == '\\'))
            commonPathLength = 1;
        else commonPathLength = checkpoint;

        if (commonPathLength == -1 || commonPathLength == 0)
            return "";
    }

    return firstPath.Substring(0, commonPathLength);
}
Legend2k
  • 1
  • 1
  • 1