19

Problem: Looking for a more efficient way of finding whether there is an exact matching value in a 1d array -- essentially a boolean true/false.

Am I overlooking something obvious? Or am I simply using the wrong data structure, by using an array when I probably should be using a collection object or a dictionary? In the latter I could check the .Contains or .Exists method, respectively

In Excel I can check for a value in a vector array like:

If Not IsError(Application.Match(strSearch, varToSearch, False)) Then
' Do stuff
End If

This returns an exact match index, obviously subject to limitations of Match function which only finds the first matching value in this context. This is a commonly used method, and one that I have been using for a long time, too.

This is satisfactory enough for Excel -- but what about other applications?

In other applications, I can do basically the same thing but requires enabling reference to the Excel object library, and then:

   If Not IsError(Excel.Application.match(...))

That seems silly, though, and is difficult to manage on distributed files because of permissions/trust center/etc.

I have tried to use the Filter() function:

 If Not Ubound(Filter(varToSearch, strSearch)) = -1 Then
    'do stuff
 End If

But the problem with this approach is that Filter returns an array of partial matches, rather than an array of exact matches. (I have no idea why it would be useful to return substring/partial matches.)

The other alternative is to literally iterate over each value in the array (this also is very commonly used I think) -- which seems even more needlessly cumbersome than calling on Excel's Match function.

For each v in vArray
   If v = strSearch Then
    ' do stuff
   End If
Next
Community
  • 1
  • 1
David Zemens
  • 53,033
  • 11
  • 81
  • 130
  • 2
    If your array can have duplicates then you may be down to the last option of looping through the array. Not really too cumbersome if you wrap it in a function. I guess the best approach is going to depend on things like the size of the array to be checked and how many lookups you need to do. If you need fast lookup speed then you can load your array to a dictionary and account for duplicates by storing an array of indexes as the "value". – Tim Williams Sep 12 '13 at 03:36
  • Hey @TimWilliams sorry I didn't make it clear enough -- all I'm testing for is boolean; does value exist in the array. Specifically, is there a better method than relying on `Excel.Application.Match` to do this e.g., in Word, or PowerPoint, etc.? – David Zemens Sep 12 '13 at 13:31
  • 1
    If the array is 1 dimension (or manipulated to 1d) then you can `Join` the array to a string and use `Instr` to test if match exists. It should be fast but haven't tested if faster than Match. –  Sep 12 '13 at 14:27
  • @ooo yes that is another method I sometimes use, but I have not tested it for speed, either. – David Zemens Sep 12 '13 at 15:03

3 Answers3

30

If we're going to talk about performance then there's no substutute for running some tests. In my experience Application.Match() is up to ten times slower than calling a function which uses a loop.

Sub Tester()

    Dim i As Long, b, t
    Dim arr(1 To 100) As String

    For i = 1 To 100
        arr(i) = "Value_" & i
    Next i

    t = Timer
    For i = 1 To 100000
        b = Contains(arr, "Value_50")
    Next i
    Debug.Print "Contains", Timer - t

    t = Timer
    For i = 1 To 100000
        b = Application.Match(arr, "Value_50", False)
    Next i
    Debug.Print "Match", Timer - t

End Sub


Function Contains(arr, v) As Boolean
Dim rv As Boolean, lb As Long, ub As Long, i As Long
    lb = LBound(arr)
    ub = UBound(arr)
    For i = lb To ub
        If arr(i) = v Then
            rv = True
            Exit For
        End If
    Next i
    Contains = rv
End Function

Output:

Contains       0.8710938 
Match          4.210938 
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • I stopped the second loop at `i = 4426`, that was almost 2 minutes, versus about 7 seconds for the Function loop which executed 100000 times. I guess that settles it. – David Zemens Sep 12 '13 at 16:47
  • See what time you get adding this :) `t = Timer: x = Join(arr, ","): For i = 1 To 100000: b = InStr(1, x, "Value_50", vbBinaryCompare): Next i: Debug.Print "Binary", Timer - t` –  Sep 12 '13 at 17:56
  • Faster again, but needs some adjustment so that "Value_5" doesn't also match "Value_50". – Tim Williams Sep 12 '13 at 18:18
  • That's where the delimiter comes in handy Value_5, –  Sep 12 '13 at 18:38
  • So: `x = Chr(0) & Join(arr, Chr(0)) & Chr(0):...: b = InStr(1, x, Chr(0) & "Value_50" & Chr(0), vbBinaryCompare)` – Tim Williams Sep 12 '13 at 18:42
  • 2
    The array method is faster than using .MATCH if the array already exists as a VBA array because each call to .MATCH incurs the large overhead of transferring the array to something that .MATCH can process. If the data is an Excel range it is much faster to use .MATCH (avoids the data transformation overhead) – Charles Williams Aug 23 '14 at 14:49
  • Re-tested and confirmed both @Tim's and Charles' statements. (So `Application.WorksheetFunction`'s are fastest on worksheets, go figure!) – ashleedawg May 22 '18 at 18:03
1

I used to look for a best replace solution. It should work for simple finding as well.

To find first instance of a string you can try using this code:

Sub find_strings_1()

Dim ArrayCh() As Variant
Dim rng As Range
Dim i As Integer

 ArrayCh = Array("a", "b", "c")

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set rng = .Find(What:=ArrayCh(i), _
        LookAt:=xlPart, _
        SearchOrder:=xlByColumns, _
        MatchCase:=False)

        Debug.Print rng.Address

    Next i
End With

End Sub

If you want to find all instances try the below.

Sub find_strings_2()

Dim ArrayCh() As Variant
Dim c As Range
Dim firstAddress As String
Dim i As Integer

 ArrayCh = Array("a", "b", "c") 'strings to lookup

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set c = .Find(What:=ArrayCh(i), LookAt:=xlPart, LookIn:=xlValues)

        If Not c Is Nothing Then
            firstAddress = c.Address 'used later to verify if looping over the same address
            Do
                '_____
                'your code, where you do something with "c"
                'which is a range variable,
                'so you can for example get it's address:
                Debug.Print ArrayCh(i) & " " & c.Address 'example
                '_____
                Set c = .FindNext(c)

            Loop While Not c Is Nothing And c.Address <> firstAddress
        End If
    Next i
End With

End Sub

Keep in mind that if there are several instances of searched string within one cell it will return only one result due to the specific of FindNext.

Still, if you need a code for replacing found values with another, I'd use the first solution, but you'd have to change it a bit.

GrzMat
  • 36
  • 3
  • Thanks but I should have clarified I'm not doing this in Excel and specifically since I'm not doing it in Excel, I can't use any approach that uses `Worksheet` object. In any case, I'm not sure how this code is more efficient than the `Match` function -- all I'm interested in is *does a value exist in the array*, which is just a `true/false`. – David Zemens Sep 12 '13 at 13:28
  • what about `If InStr(1, myString, strLookFor) = 0 Then` do stuff... ? – GrzMat Sep 12 '13 at 14:28
1

"A more efficient way (compared to Application.Match)of finding whether a string value exists in an array":

I believe there is no more efficient way than the one you are using, i.e., Application.Match.

Arrays allow efficient access in any element if we know the index of that element. If we want to do anything by element value (even checking if an element exists), we have to scan all the elements of the array in the worst case. Therefore, the worst case needs n element comparisons, where n is the size of the array. So the maximum time we need to find if an element exists is linear in the size of the input, i.e., O(n). This applies to any language that uses conventional arrays.

The only case where we can be more efficient, is when the array has special structure. For your example, if the elements of the array are sorted (e.g. alphabetically), then we do not need to scan all the array: we compare with the middle element, and then compare with the left or right part of the array (binary search). But without assuming any special structure, there is no hope..

The Dictionary/Collection as you point, offers constant key access to their elements (O(1)). What perhaps is not very well documented is that one can also have index access to the dictionary elements (Keys and Items): the order in which elements are entered into the Dictionary is preserved. Their main disadvantage is that they use more memory as two objects are stored for each element.

To wrap up, although If Not IsError(Excel.Application.match(...)) looks silly, it is still the more efficient way (at least in theory). On permission issues my knowledge is very limited. Depending on the host application, there are always some Find-type functions (C++ has find and find_if for example).

I hope that helps!

Edit

I would like to add a couple of thoughts, after reading the amended version of the post and Tim's answer. The above text is focusing on the theoretical time complexity of the various data structures and ignores implementation issues. I think the spirit of the question was rather, "given a certain data structure (array)", what is the most efficient way in practice of checking existence.

To this end, Tim's answer is an eye-opener.

The conventional rule "if VBA can do it for you then don't write it again yourself" is not always true. Simple operations like looping and comparisons can be faster that "agreegate" VBA functions. Two interesting links are here and here.

Community
  • 1
  • 1
Ioannis
  • 5,238
  • 2
  • 19
  • 31
  • 1
    "here is no more efficient way than the one you are using, i.e., Application.Match" --the poster specifies a solution that does not depend on Excel-specific functions. – johny why Mar 08 '16 at 21:17