0

I have a list of 8200 items in a graphicList. I need to sort and remove duplicates from this list and get the unique values out. I've tried the function below but it's not working.

Here's the code I was using:

Private Sub RemoveDupes(ByRef Items As List(Of String), Optional ByVal NeedSorting As Boolean = False)

    Dim Temp As New List(Of String)

    'Remove Duplicates
    For Each Item As String In Items
        'Check if item is in Temp
        If Not Temp.Contains(Item) Then
            'Add item to list.
            Temp.Add(Item)
        End If
        statusText = "Removing Duplicate Images in List"
    Next Item

    'Send back new list.
    Items = Temp
End Sub
mightymax
  • 431
  • 1
  • 5
  • 16
  • 1
    Side note: always prefer a function over a procedure with byref parameters. The code will be easier to read and maintain. – LarsTech Aug 27 '19 at 20:02
  • I figured out where the first list of files would come from Dim XMLFiles = Directory.EnumerateFiles(CopyToPath, "*.*", SearchOption.AllDirectories). The LINQ look good, but I didn't understand how I could use it @Andrew Morton – mightymax Aug 27 '19 at 20:03
  • 2
    Use a Function, not a Sub. Have you tried to just `Return Items.Distinct().ToList()`? – Jimi Aug 27 '19 at 20:03
  • 1
    It list the same graphic a hundred times in the missingGraphicsList – mightymax Aug 27 '19 at 20:14
  • 1
    Oh I just had an ah ha moment. there is no second list for the remove dups function. It's one list i'm trying to remove duplicates out of. – mightymax Aug 27 '19 at 20:23

2 Answers2

0

To get sorted and remove duplicates use method of List(of T).Sort and Enumerable.Distinct.

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim lstStrings As New List(Of String) From {"Mathew", "Mark", "Mathew", "Luke", "John", "Luke", "Mark", "Mathew"}
    Dim CleanList = lstStrings.Distinct.ToList
    CleanList.Sort()
    For Each item In CleanList
        Debug.Print(item)
    Next
End Sub
Mary
  • 14,926
  • 3
  • 18
  • 27
0

Using a loop is faster than LINQ.

With versions prior to .Net 4.0 Contains isn't the fastest method and its faster to use IndexOf instead (under the hood Contains actually uses IndexOf). After .Net 4 this isn't the case: Is String.Contains() faster than String.IndexOf()?

Private Function RemoveDupes(ByVal Items As List(Of String), Optional ByVal NeedSorting As Boolean = False) As List(Of String)
    Dim Temp As New List(Of String)
    For Each Item As String In Items
        If Temp.IndexOf(Item) = 0 Then Temp.Add(Item)
    Next Item
    statusText = "Removing Duplicate Images in List"  'Outside the loop
    return Temp
End Sub

It's actually faster to use Arrays:

Private Function RemoveDupes(ByVal Items As List(Of String), Optional ByVal NeedSorting As Boolean = False) As List(Of String)
    Dim Temp As New List(Of String)
    Dim i as Integer
    For i = 0 To Items.Count - 1
        If Temp.IndexOf(Items[i]) = 0 Then Temp.Add(Items[i])
    Next Item
    statusText = "Removing Duplicate Images in List"
    return Temp
End Sub

I don't think you'll see much in the way of performance improvements and unless its real-time does it matter? Its a trade off, are you happy to sacrifice speed for code maintainability? If yes, rewrite it in C++. Here's a closely related Jeff Atwood article that's relevant to the question: https://blog.codinghorror.com/the-sad-tragedy-of-micro-optimization-theater/

Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321