1

Looking for help on how to perform a LINQ query using the .Contains() method of a List(Of T) to get back elements that are not contained in a second List(Of T) based on a property of a property of T in the first List(Of T).

Here is some sample code that I wrote up, this scenario is ficticious, but the concept is still there.

Module Module1

    Sub Main()
        ' Get all Files in a directory that contain `.mp` in the name
        Dim AllFiles As List(Of IO.FileInfo) = New IO.DirectoryInfo("C:\Test\Path").GetFiles("*.mp*").ToList
        Dim ValidFiles As New List(Of fileStruct)

        ' Get all Files that actually have an extension of `.mp3`
        AllFiles.ForEach(Sub(x) If x.Extension.Contains("mp3") Then ValidFiles.Add(New fileStruct(prop1:=x.Name, path:=x.FullName)))

        ' Attempting the get all files that are not listed in the Valid files list
        Dim InvalidFiles As IO.FileInfo() = From file As IO.FileInfo In AllFiles Where Not ValidFiles.Contains(Function(x As fileStruct) x.fleInfo.FullName = file.FullName) Select file
        ' Errors on the `.Contains()` method because I have no idea what I'm doing and I am basically guessing at this point

        'Here is the same but instead using the `.Any()` Method
        Dim InvalidFiles As IO.FileInfo() = From file As IO.FileInfo In AllFiles Where Not ValidFiles.Any(Function(x As fileStruct) x.fleInfo.FullName = file.FullName) Select file
        ' This doesn't error out, but all files are returned
    End Sub

    Public Structure fileStruct
        Private _filePath As String
        Private _property1 As String

        Public ReadOnly Property property1 As String
            Get
                Return _property1
            End Get
        End Property

        Public ReadOnly Property fleInfo As IO.FileInfo
            Get
                Return New IO.FileInfo(_filePath)
            End Get
        End Property

        Public Sub New(ByVal prop1 As String, ByVal path As String)
            _property1 = prop1
            _filePath = path
        End Sub
    End Structure
End Module

3 Answers3

1

This is a more or less direct implementation of the MP3 files list in the question. I did use a FileItem class instead of a structure. The good part is afterwards:

' note: EnumerateFiles
Dim AllFiles As List(Of IO.FileInfo) = New IO.DirectoryInfo("M:\Music").
    EnumerateFiles("*.mp*", IO.SearchOption.AllDirectories).ToList()

Dim goofyFilter As String() = {"g", "h", "s", "a"}

' filter All files to those starting with the above (lots of
' Aerosmith, Steely Dan and Heart)
Dim ValidFiles As List(Of FileItem) = AllFiles.
                Where(Function(w) goofyFilter.Contains((w.Name.ToLower)(0))).
                Select(Function(s) New FileItem(s.FullName)).ToList()

Dim invalid As List(Of FileInfo)

invalid = AllFiles.Where(Function(w) Not ValidFiles.
                        Any(Function(a) w.FullName = a.FilePath)).ToList()

This is much the same as Sam's answer except with your file/mp3 usage. AllFiles has 809 items, ValidFiles has 274. The resulting invalid list is 535.


Now, lets speed it up 50-60x:

Same starting code for AllFiles and ValidFiles:

Dim FileItemValid = Function(s As String)
                        Dim valid As Boolean = False
                        For Each fi As FileItem In ValidFiles
                            If fi.FilePath = s Then
                                valid = True
                                Exit For
                            End If
                        Next
                        Return valid
                    End Function

invalid = AllFiles.Where(Function(w) FileItemValid(w.FullName) = False).ToList()

With a Stopwatch, the results are:

    Where/Any count: 535, time: 572ms  
FileItemValid count: 535, time: 9ms

You get similar results with a plain old For/Each loop that calls an IsValid function.


If you do not need other FileInfo, you could create your AllFiles as a list of the same structure as you are receiving so you can do property vs property compares, use Except and Contains:

AllFiles2 = Directory.EnumerateFiles("M:\Music", "*.mp3", IO.SearchOption.AllDirectories).
            Select(Function(s) New FileItem(s)).ToList()

Now you can use Contains with middling results:

invalid2 = AllFiles2.Where(Function(w) Not ValidFiles.Contains(w)).ToList()

This also allows you to use Except which is simpler and faster:

invalid2 = AllFiles2.Except(ValidFiles).ToList()
 Where/Contains count: 535, time: 74ms  
         Except count: 535, time: 3ms

Even if you need other items from FileInfo, you can easily fetch them given the filename

Ňɏssa Pøngjǣrdenlarp
  • 38,411
  • 12
  • 59
  • 178
  • But still have multiple iterations of AllFiles (one time for the valid files, one time for the invalid files) – Daniel Bişar Oct 22 '15 at 12:48
  • No. AllFiles is iterated once testing if each is in ValidFiles. That results in at least a partial iteration of ValidFiles, but *invalid(files)* is the result. The IL code for a For/Each loop and the Where and anon lambda is basically identical – Ňɏssa Pøngjǣrdenlarp Oct 22 '15 at 13:30
  • Plus One for Steely Dan. Definitely belongs in the Valid Files list. –  Oct 22 '15 at 15:54
  • Also, if you want to really speed things up, use a HashSet. http://stackoverflow.com/questions/2728500/hashsett-versus-dictionaryk-v-w-r-t-searching-time-to-find-if-an-item-exist –  Oct 22 '15 at 16:53
  • Actually 3-4ms is about as fast as it seems to go. A Hashset for ValidFiles speeds up Where/Contains from 74 to 4ms, but it is slower than some of the others (by only a very little). The problem for the OP may be using a hashset with them since the come from elsewhere. – Ňɏssa Pøngjǣrdenlarp Oct 22 '15 at 17:19
  • I used the Lamda in a variable option in my project, however to make the results of my COM object match on the path I had to use `.Normalize()` and `.Equals()` on the paths: `If fi.LocalPackage.FullName.Normalize.Equals(inFle.FullName.Normalize, StringComparison.OrdinalIgnoreCase) Then` – вʀaᴎᴅᴏƞ вєнᴎєƞ Oct 26 '15 at 16:34
  • The string handling makes sense considering COM (we of course cant tell that from the question). – Ňɏssa Pøngjǣrdenlarp Oct 26 '15 at 17:38
0

As others have noted, .Except() is a better approach but here is an answer to your question:

List<int> list1 = new List<int> { 1, 2, 3 };

List<int> list2 = new List<int> { 3, 4, 5 };

List<int> list3 = list1.Where(list1value => !list2.Contains(list1value)).ToList();  // 1, 2

Based on comments here as an example using different types. This query use .Any()

List<Product> list1 = new List<Produc> { ... };

List<Vendor> list2 = new List<Vendor> { ... };

List<Product> list3 = list1.Where(product => !list2.Any(vendor => product.VendorID == vendor.ID)).ToList();  


// list3 will contain products with a vendorID that does not match the ID of any vendor in list2.
0

Simply use Except as CraigW suggested. You have to do some projections (select) to get it done.

Dim InvalidFiles as IO.FileInfo() = AllFiles.Select(Function(p) p.FullName).Except(ValidFiles.Select(Function(x) x.fleInfo.FullName)).Select(Function(fullName) New IO.FileInfo(fullName)).ToArray()

Note: This code is not really efficient and also not very readable but works.

But i would go for something like this:

Dim AllFiles As List(Of IO.FileInfo) = New IO.DirectoryInfo("C:\MyFiles").GetFiles("*.mp*").ToList
Dim ValidFiles As New List(Of fileStruct)
Dim InvalidFiles as New List(Of FileInfo)

For Each fileInfo As FileInfo In AllFiles
    If fileInfo.Extension.Contains("mp3") Then 
        ValidFiles.Add(New fileStruct(prop1:=fileInfo.Name, path:=fileInfo.FullName))
    Else 
        InvalidFiles.Add(fileInfo)
    End If
Next

Simple, fast and readable.

Daniel Bişar
  • 2,663
  • 7
  • 32
  • 54
  • I am receiving a list of valid files back in a struct from a COM object and then I need to compare those to files in a directory and what I need is the difference between all the files and the files returned by the COM object – вʀaᴎᴅᴏƞ вєнᴎєƞ Oct 22 '15 at 13:35