3

I have a List(of String). For example: {"C1", "C12", "C10", "C1", "C6", "C22", "C1", "C6"}. I am trying to write a function to give me a list of duplicates: {"C1", "C6"} in the list. Each duplicate will be listed only once. The function I wrote does give me anything back at all. I can't figure out why. Any help or alternative approaches are appreciated. FYI, I see a very similar question in C# but I don't know how to translate that syntax into VB.net since I am not up to speed on LINQ yet. It is here: How to get duplicate items from a list using LINQ?

    ''' <summary>
    ''' Given a List(Of String), returns a list of items that are duplicated in the list.
    ''' Each duplicate returned is unique.
    ''' </summary>
    ''' <param name="Set1"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Shared Function GetDuplicateItems(ByVal Set1 As List(Of String)) As List(Of String)
        Dim DistinctItems As IEnumerable(Of String)
        'Dim DistinctResults As New List(Of String)
        Dim DuplicateItems As IEnumerable(Of String)
        Dim ItemsToReturn As New List(Of String)

        'Get a set of unique items in the list
        DistinctItems = Set1.Select(Function(x) x).Distinct()
        'Do I need to enumerate the result in order to force the thing to execute?
        'See remarks section of http://msdn.microsoft.com/en-us/library/bb300779.aspx
        'For Each Item As String In DistinctItems
        '    DistinctResults.Add(Item)
        'Next
        'Do a set subtraction (Set1 - UniqueItems)
        DuplicateItems = Set1.Except(DistinctItems)

        For Each Item As String In DuplicateItems
            ItemsToReturn.Add(Item)
        Next

        Return ItemsToReturn
    End Function
Community
  • 1
  • 1
tullynyguy
  • 45
  • 1
  • 1
  • 4

1 Answers1

10

This has nothing to do with deferred execution, your algorithm is just wrong: Distinct does not return the unique items of the list, it just removes duplicates. Example: {"C1", "C12", "C10", "C1", "C6", "C22", "C1", "C6"}.Distinct() yields {"C1", "C12", "C10", "C6", "C22"} in your case. Thus, Set1.Except(DistinctItems) will always result in an empty list.


Here's an alternative solution to your problem. It selects all items whose count in the list is greater than one:

Dim duplicates = list.Where(Function(x) list.Where(Function(y) x = y).Count() > 1).Distinct()

Usage example:

Dim list As New List(Of String) From {"a", "a", "b", "c", "c"}
Dim duplicates = list.Where(Function(x) list.Where(Function(y) x = y).Count() > 1).Distinct()
' duplicates now contains {"a", "c"}

EDIT: Alternative solution using GroupBy (inspired by aquinas):

Dim duplicates = list.GroupBy(Function(x) x).Where(Function(x) x.Count > 1).Select(Function(x) x.Key)
Heinzi
  • 167,459
  • 57
  • 363
  • 519
  • 1
    That works, but it's O(N^2). Perhaps a group by would make more sense in this case? – aquinas May 31 '12 at 15:16
  • @aquinas: Good point, I've added an alternative using group by (now the complexity depends on the implementation of GroupBy ;-)). – Heinzi May 31 '12 at 15:24
  • @aquinas: Oops, just saw that you added your own answer. Rollback and +1. ;-) – Heinzi May 31 '12 at 15:25
  • 2
    Heh. Oops. I deleted mine! Go ahead and put your edit back in, as your answer is more complete WRT deferred execution. – aquinas May 31 '12 at 15:27
  • @Heinzi I think you pointed out a problem in my algorithm but not the one you thought. if set1 = {"C1", "C12", "C10", "C1", "C6", "C22", "C1", "C6"} and distinctitems = {"C1", "C12", "C10", "C6", "C22"} doesn't set1.except(DistinctItems) = {"C1", "C1", "C6"} If so, then I'd love to know how to make that work just to learn. But I like your solution better. – tullynyguy May 31 '12 at 15:28
  • [`Enumerable.Except`](http://msdn.microsoft.com/en-us/library/bb300779.aspx) computes the *set difference*, which means that *all* copies of DistinctItems are removed. In other words, `{1, 1, 2}.Except{1}` yields `{2}`, not `{1, 2}`. Think about the meaning of the word "except": If I say "I want all numbers except C1", this usually means that all instances of C1 should be removed. – Heinzi May 31 '12 at 15:33
  • @Heinzi Thanks! This was a big help. – tullynyguy May 31 '12 at 18:54