1

I have two List<FileInfo> lists, SourceFiles and DestFiles. I want to build a LINQ query that will return a list of the items whose filenames are in Source but not in Dest, i.e. a left join.

My data set for SourceFiles is:

folder1\a.txt
folder1\b.txt
folder1\c.txt
folder1\d.txt

DestFiles is:

folder2\a.txt
folder2\b.txt
folder2\c.txt

so the query should return folder1\d.txt.

Following the MSDN example, I've tried using LINQ syntax:

var queryX = from s in SourceFiles
             join d in DestFiles
             on s.Name equals d.Name
             into SourceJoinDest
             from joinRow in SourceJoinDest.DefaultIfEmpty()
             select new
             {
                 joinRow.FullName
             };

and using extension methods:

var query = SourceFiles.GroupJoin(DestFiles,
                                    source => source.Name,
                                    dest => dest.Name,
                                    (source,dest) => new
                                    {
                                        path = source.FullName
                                    }).Select(x => x.path.DefaultIfEmpty())

But neither one of these work; the LINQ syntax version returns Object reference not sent to an instance of an object and the extension version returns Enumeration yielded no results.

I realize that these queries are only returning sets of FullName properties and not the full FileInfo objects; I have code that takes each FullName and returns a FileInfo, and does this for each item in the query to rebuild the list. But if there's a way to return a FileInfo directly from the query, that would be great.

sigil
  • 9,370
  • 40
  • 119
  • 199
  • What's the difference between FullName and Name? Does FullName include the whole path, or does Name have that as well? – Corey Adler Dec 26 '12 at 20:26
  • Those are detailed in the [definition](http://msdn.microsoft.com/en-us/library/system.io.fileinfo.aspx) for the `FileInfo` class: `Name` is just the filename and extension, while `FullName` includes the full path – sigil Dec 26 '12 at 20:42
  • Doh! Forgot that it was a .NET class. I thought it was custom. – Corey Adler Dec 26 '12 at 20:46

3 Answers3

3

I don't think Join is the ideal tool here. Basically you're looking for an Except. The built in Except doesn't have the overload to specify your properties through lambda. You will have to create your own IEqualityComparer. You could do it, however, like this:

var excepts = SourceFiles.Where(c => !DestFiles.Any(p => p.Name == c.Name)).ToList();

Or, to select just the full path, you can use Select at the end.

var excepts = SourceFiles.Where(c => !DestFiles.Any(p => p.Name == c.Name))
                         .Select(f => f.FullName).ToList();

I would suggest having extension methods to do quick Except and Intersect.

public static IEnumerable<U> Except<R, S, T, U>(this IEnumerable<R> mainList, 
                                                IEnumerable<S> toBeSubtractedList,
                                                Func<R, T> mainListFunction, 
                                                Func<S, T> toBeSubtractedListFunction,
                                                Func<R, U> resultSelector)
{
    return EnumerateToCheck(mainList, toBeSubtractedList, mainListFunction, 
                            toBeSubtractedListFunction, resultSelector, false);
}

static IEnumerable<U> EnumerateToCheck<R, S, T, U>(IEnumerable<R> mainList, 
                                                   IEnumerable<S> secondaryList,
                                                   Func<R, T> mainListFunction, 
                                                   Func<S, T> secondaryListFunction,
                                                   Func<R, U> resultSelector,
                                                   bool ifFound)
{
    foreach (var r in mainList)
    {
        bool found = false;
        foreach (var s in secondaryList)
        {
            if (object.Equals(mainListFunction(r), secondaryListFunction(s)))
            {
                found = true;
                break;
            }
        }

        if (found == ifFound)
            yield return resultSelector(r);
    }

    //or may be just
    //return mainList.Where(r => secondaryList.Any(s => object.Equals(mainListFunction(r), secondaryListFunction(s))) == ifFound)
    //               .Select(r => resultSelector(r));
    //but I like the verbose way.. easier to debug..
}

public static IEnumerable<U> Intersect<R, S, T, U>(this IEnumerable<R> mainList, 
                                                   IEnumerable<S> toIntersectList,
                                                   Func<R, T> mainListFunction,
                                                   Func<S, T> toIntersectListFunction,
                                                   Func<R, U> resultSelector)
{
    return EnumerateToCheck(mainList, toIntersectList, mainListFunction, 
                            toIntersectListFunction, resultSelector, true);
}

Now in your case you can do just:

var excepts = SourceFiles.Except(DestFiles, p => p.Name, p => p.Name, p => p.FullName)
                         .ToList();
nawfal
  • 70,104
  • 56
  • 326
  • 368
  • Works great, and very concise. LINQ has been the most difficult thing for me to learn in .NET; can you recommend any sites that have well-written tutorials about lambda/extension trickery like this? – sigil Dec 26 '12 at 21:04
  • @sigil to be honest I have felt the same too to start with Linq and lambda, but to tell you once I have got a hang of it, its so simple and very very useful tool to know. To start with Linq you can start [from our own Jon Skeet's material](https://msmvps.com/blogs/jon_skeet/archive/tags/Edulinq/default.aspx). which is really really good. But knowing lambda and `Func` will be important there, so start from [this SO thread](http://stackoverflow.com/questions/167343/c-sharp-lambda-expression-why-should-i-use-this). Do not forget to read a bit about `deferred execution` when learning Linq. – nawfal Dec 26 '12 at 21:06
  • hm, i'm seeing some odd behavior with the unit test on this. i've made an expected result `List expectedResult = new List(); expectedResult.Add(new FileInfo(@"u:\folder1\d.txt"));` which should be identical to the result in `excepts`. But when I compare them using `CollectionAssert.AreEquivalent(result, expectedResult);` the test fails. The paths are the same--why does the test fail? (i can post this as a new question if need be) – sigil Dec 26 '12 at 21:58
  • Not very sure Sigil, we could give a better idea if you make it another question with a proper sample data set. – nawfal Dec 26 '12 at 22:06
0

Instead of using a join you might be able to handle this with .Except()

var enumerable = sourceFiles.Except(destFiles, new FileInfoComparer<FileInfo>((f1, f2)=>f1.Name == f2.Name, f=>f.Name.GetHashCode()));

.Except() takes an IEqualityComparer<T> which you can write yourself or use a wrapper that takes a lambda.

    class FileInfoComparer<T> : IEqualityComparer<T>
    {
        public FileInfoComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
        {
            _equals = equals;
            _getHashCode = getHashCode;
        }

        readonly Func<T, T, bool> _equals;
        public bool Equals(T x, T y)
        {
            return _equals(x, y);
        }

        readonly Func<T, int> _getHashCode;
        public int GetHashCode(T obj)
        {
            return _getHashCode(obj);
        }
    } 

Running it with a few sample data results in the one FileInfo object which contains "d.txt"

Mark Coleman
  • 40,542
  • 9
  • 81
  • 101
0

You almost did it. But you need to take only those source files, which do not have joined destination files:

var query = from s in SourceFiles
            join d in DestFiles
                on s.Name equals d.Name into g
            where !g.Any() // empty group!
            select s;
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459