0

I have some legacy code which consists of:

A Dictionary (dictParts) populated during startup with approx. 350,000 items (does not change during runtime). dictParts is a System.Collections.Generic.Dictionary(Of String, System.Data.DataRow).

Each item in dictParts is a System.Collections.Generic.KeyValuePair(Of String, System.Data.DataRow).

An Array (arrOut) which frequently has items added and removed (typically between 2-6 items in the array). arrOut is a System.Array containing only strings.

Each time the array changes I need to see if:

  • All the items in the array exist in the index
  • Some of the items in the array exist in the index

I imagine that looping through the index 350,000 every time the array changes is going to have massive performance hit and looked to LINQ to help.

I have tried the following:

Private Sub btnTest_Click(sender As System.Object, e As System.EventArgs) Handles btnTest.Click

    Dim dictParts = New Dictionary(Of Integer, String) _
                    From {{1, "AA-10-100"}, _
                          {2, "BB-20-100"}, _
                          {3, "CC-30-100"}, _
                          {4, "DD-40-100"}, _
                          {5, "EE-50-100"}}


    Dim arrOut() As String = {"AA-10-100", "BB-20-100", "CC-30-100"}

    'Tried
    Dim allPartsExist As IEnumerable(Of String) = arrOut.ToString.All(dictParts)
    'And this
    Dim allOfArrayInIndex As Object = arrOut.ToString.Intersect(dictParts).Count() = arrOut.ToString.Count

End Sub

I keep getting errors: Unable to cast object of type 'System.Collections.Generic.Dictionary2[System.Int32,System.String]' to type 'System.Collections.Generic.IEnumerable1[System.Char]'.

Please could someone advise where I am going wrong.

NetMage
  • 26,163
  • 3
  • 34
  • 55
GoodJuJu
  • 1,296
  • 2
  • 16
  • 37
  • If performance is your concern, [maybe you could try looking at using a hashset instead of an array?](https://stackoverflow.com/questions/4558754/define-what-is-a-hashset) – emsimpson92 Jun 06 '18 at 22:47
  • I think you might mean `arrOut[]` rather than `arrOut()`. arrOut is supposed to be an array of strings rather than a method, am I right? I'm not as familiar with VB as I am with C#. – emsimpson92 Jun 06 '18 at 22:51
  • @emsimpson92 arrays in vb.net are arr() with parenthesis. Also, when declared with a size arr(6); the 6 is the upper bound not the number of elements as in C#. – Mary Jun 06 '18 at 23:35
  • How big is the array? If the array is small then it doesn't matter how big the `Dictionary` is. You can just call `myArray.All(Function(s) myDictionary.ContainsKey(s))`. The point of a `Dictionary` is that key access is very fast, so testing for a small number of keys in a big `Dictionary` is still going to be fast. – jmcilhinney Jun 07 '18 at 01:20
  • 1
    That said, do you really need to test the whole array every time? After the first check, surely you only need to check the new value when an element changes. If you were to restrict access to changing an element to a single method then you could check the `Dictionary` for just the new value in that method. – jmcilhinney Jun 07 '18 at 01:22
  • @jmcilhinney Did you notice the key of the `Dictionary` is `Integer` and it is the value that is being tested for membership? – NetMage Jun 07 '18 at 18:18
  • LINQ is not faster than looping, it is slower than looping. Using the correct data structure is what you need for speeding up testing of such a large collection. – NetMage Jun 07 '18 at 18:19
  • How often does `dictParts` change? What does the `Integer` key represent? – NetMage Jun 07 '18 at 18:33
  • `dictParts` (350,000) items does not change. It gets populated during startup. The array `arrOut` of approximately 2-6 items gets cleared and repopulated frequently. It is after each 'refresh' of the array that I need to check to see if all the array items (strings) exist in the dictionary (strings). – GoodJuJu Jun 07 '18 at 23:13

2 Answers2

1

To learn something, I tried the hashset suggested by @emsimpson92. Maybe it can work for you.

Imports System.Text
Public Class HashSets
    Private shortList As New HashSet(Of String)
    Private longList As New HashSet(Of String)

    Private Sub HashSets_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        shortList.Add("AA-10-100")
        shortList.Add("BB-20-100")
        shortList.Add("DD-40-101")
        Dim dictParts As New Dictionary(Of Integer, String) _
        From {{1, "AA-10-100"},
                          {2, "BB-20-100"},
                          {3, "CC-30-100"},
                          {4, "DD-40-100"},
                          {5, "EE-50-100"}}
        For Each kv As KeyValuePair(Of Integer, String) In dictParts
            longList.Add(kv.Value)
        Next
    'Two alternative ways to fill the hashset
    '1. remove the New from the declaration
    'longList = New HashSet(Of String)(dictParts.Values)
    '2. Added in Framework 4.7.2
    'Enumerable.ToHashSet(Of TSource) Method (IEnumerable(Of TSource))
    'longList = dictParts.Values.ToHashSet()
    End Sub

    Private Sub CompareHashSets()
        Debug.Print($"The short list has {shortList.Count} elements")
        DisplaySet(shortList)
        Debug.Print($"The long list has {longList.Count}")
        shortList.ExceptWith(longList)
        Debug.Print($"The items missing from the longList {shortList.Count}")
        DisplaySet(shortList)
        'Immediate Window Results
        'The Short list has 3 elements
        '{ AA-10-100
        'BB-20 - 100
        'DD-40 - 101
        '}
        'The Long list has 5
        'The items missing from the longList 1
        '{ DD-40-101
        '}
    End Sub

    Private Shared Sub DisplaySet(ByVal coll As HashSet(Of String))
        Dim sb As New StringBuilder()
        sb.Append("{")
        For Each s As String In coll
            sb.AppendLine($" {s}")
        Next
        sb.Append("}")
        Debug.Print(sb.ToString)
    End Sub

    Private Sub btnCompare_Click(sender As Object, e As EventArgs) Handles btnCompare.Click
        CompareHashSets()
    End Sub
End Class

Note: The code filling the hash set from the dictionary will not work if there are duplicate values in the dictionary (not duplicate keys, duplicate values) because the elements in the hashset must be unique.

Mary
  • 14,926
  • 3
  • 18
  • 27
  • You don't need a `HashSet` for the `Array`. – NetMage Jun 07 '18 at 18:32
  • Good to know. I guess Array.Except hashes the array. Yes? – Mary Jun 07 '18 at 21:33
  • We know the `Array` is small (up to 6 items). We aren't looking up stuff in the `Array`, which is the main advantage of `HashSet`. We have to test every item in the `Array` to determine status. There is no `Array.Except` but `Enumerable.Except` uses a specialized private version of a `HashSet` to compute the answer. – NetMage Jun 07 '18 at 21:47
  • I really appreciate you taking the time to provide clear working code examples. I have done some tests and it looks like it might work. @NetMage has provided another answer which also appears suitable. I will conduct some testing and come back to you. Thanks again. – GoodJuJu Jun 07 '18 at 23:33
1

Running a test with a HashSet versus the original Dictionary containing 350,000 values, with the matching items added last to the Dictionary, the HashSet is over 15,000 times faster.

Testing against the original Dictionary:

Dim AllInDict = arrOut.All(Function(a) dictParts.ContainsValue(a))
Dim SomeInDict = arrOut.Any(Function(a) dictParts.ContainsValue(a))

The HashSet creation does take the time of four Dictionary searches, so it isn't worth it if you change the Dictionary more often than every four searches.

Dim hs = New HashSet(Of String)(dictParts.Values)

Then you can use the HashSet to test membership, which is at least 14,000 times faster than searching the entire Dictionary (of course, on average it will be about 50% of that faster).

Dim AllInDict2 = arrOut.All(Function(a) hs.Contains(a))
Dim SomeInDict2 = arrOut.Any(Function(a) hs.Contains(a))
NetMage
  • 26,163
  • 3
  • 34
  • 55
  • I am very grateful to you for taking the time to respond with such clean concise answers. When I try yo use your code above (Dim AllInDict & Dim SomeInDict) it throws an error: Error 16 Value of type 'String' cannot be converted to 'System.Data.DataRow'. – GoodJuJu Jun 07 '18 at 23:27
  • @GoodJuJu I based my answer on your example. Since you stated `arrOut` is a string and the values in `dictParts` is a `DataRow`, how do you expect to check if a member of `arrOut` is in `dictParts`? You didn't provide any information about the `DataRow` in your question. I assumed the "index" referred to in your question is `dictParts` though that isn't stated. Perhaps change your sample to show `DataRow`? – NetMage Jun 08 '18 at 00:07