3

say, i have a string array of 50000 elements. Searching the array using For Next is so slow for such a huge array. Is there any fast way to search?

Note: Using join & instr we can search for a string in an array, but this method is no good as i can not find out the element number

Note: the array is unsorted. And i'm looking for substrings

Olle Sjögren
  • 5,315
  • 3
  • 31
  • 51
Nok Imchen
  • 2,802
  • 7
  • 33
  • 59

7 Answers7

4

Try using the Filter(InputStrings, Value[, Include[, Compare]]) function. It returns an array of the matching strings.

The complete syntax can be found on MSDN

MarkJ
  • 30,070
  • 5
  • 68
  • 111
jac
  • 9,666
  • 2
  • 34
  • 63
  • But is it quicker? BTW I took the liberty of editing the link to point to the **VB6 docs** not Vb.Net docs – MarkJ Dec 29 '11 at 12:59
  • @MarkJ - No idea because I was too lazy to make a 50,000 element array to test it on, although built-in functions are generally written in C or C++ and optimized. Also I realized if he needs the index of the element this is not complete. And thank you for fixing the link. – jac Dec 29 '11 at 16:51
  • 1
    @MarkJ You got me curious and I loaded a 50,000 item array with random words and did a few searches. Using the GetTickCount function returned 78 milliseconds for most searches. Not as good as Dick Kusleika's result, but very quick none the less. – jac Dec 29 '11 at 22:29
3

This is an expansion of your idea to use Join and InStr:

Sub TestArraySearch()
Dim A(4) As String
    A(0) = "First"
    A(1) = "Second"
    A(2) = "Third"
    A(3) = "Fourth"
    A(4) = "Fifth"
    Debug.Print FastArraySearch(A, "Fi")
    Debug.Print FastArraySearch(A, "o")
    Debug.Print FastArraySearch(A, "hird")
    Debug.Print FastArraySearch(A, "Fou")
    Debug.Print FastArraySearch(A, "ndTh")
    Debug.Print FastArraySearch(A, "fth")
End Sub

Function FastArraySearch(SearchArray As Variant,SearchPhrase As String) As String
Dim Pos As Long, i As Long, NumCharsProcessed As Long, Txt As String
    Pos = InStr(Join(SearchArray, "§"), SearchPhrase)
    If Pos > 0 Then
        For i = LBound(SearchArray) To UBound(SearchArray)
            NumCharsProcessed = NumCharsProcessed + Len(SearchArray(i)) + 1
            If NumCharsProcessed >= Pos Then
                FastArraySearch = SearchArray(i)
                Exit Function
            End If
        Next i
    End If
End Function

I did not benchmark it, but it should be quicker than doing a separate search each time through the loop. It searches once, then just adds up the string lengths until it gets to where the match was. Because the length of the string is stored before any of the characters in the string, the Len function is highly optimized.

If this performance is still unacceptable I think you will need to find a different data structure than an array (eg, a disconnected recordset, as @Remou suggested).

brettdj
  • 54,857
  • 16
  • 114
  • 177
mwolfe02
  • 23,787
  • 9
  • 91
  • 161
3

Can you show the code you're using at how long it takes? Also, how long is too long? This code reads in 50,000 strings and finds the 275 that contain substring in just over 300 milliseconds.

Sub testarr()

    Dim vaArr As Variant
    Dim i As Long
    Dim dTime As Double
    Dim lCnt As Long

    dTime = Timer

    vaArr = Sheet1.Range("A1:A50000")

    For i = LBound(vaArr, 1) To UBound(vaArr, 1)
        If InStr(1, vaArr(i, 1), "erez") > 0 Then
            lCnt = lCnt + 1
            Debug.Print i, vaArr(i, 1)
        End If
    Next i

    Debug.Print Timer - dTime
    Debug.Print lCnt

End Sub
Dick Kusleika
  • 32,673
  • 4
  • 52
  • 73
  • 2
    +1 for "how long is too long?" Does the 300 ms include the time to Debug.Print the matching results? Can you time this without the Debug.Print and share the results? – JeffK Dec 29 '11 at 16:11
  • Good point JeffK. Removing the Debug.Print in the loop, it's a hair over thirty (30) ms. That should be fast enough. – Dick Kusleika Dec 29 '11 at 19:24
1

The number one way of speeding up any array indexing operation in VB6 is to recompile the component with the following option:

  • Click Project "Properties" menu item
  • Click "Compile" Tab
  • Click "Advanced Optimizations" button
  • Check "Remove Array Bounds Checks"
  • Press Ok, etc.

Now your array indexing should be as fast as the equivalent C/C++ operation.

The only issue is that you should ensure that your code never refers to indexes outside its normal array bounds. Previously, you would get a VB runtime error. After this, you might get Access Violation instead.

Mark Bertenshaw
  • 5,594
  • 2
  • 27
  • 40
0

here's a fast way to return the number of substring occurrences. Hope it helps!

Option Explicit
Option Compare Binary
Option Base 0
DefLng A-Z
Sub TestSubStringOccurence()

Dim GrabRangeArray() As Variant
Dim i As Long
Dim L As Long
Dim RunTime As Double
Dim SubStringCounter As Long
Dim J As Long
Dim InStrPosition As Long
Dim Ws As Excel.Worksheet

Set Ws = ThisWorkbook.Sheets("Sheet1")

RunTime = Timer

With Ws    
    For i = 1 To 50000
        If i Mod 2 = 0 Then .Cells(i, 1).Value2 = "1 abcdef 2 abcdef 3 abcdef 4 abcdef 5 abcdef" _
        Else .Cells(i, 1).Value2 = i        Next i

    GrabRangeArray = .Range("a1:a50000").Value        
End With    
RunTime = Timer

'returns number of substring occurrences

For i = 1 To UBound(GrabRangeArray, 1)
    InStrPosition = 1
    Do
        InStrPosition = InStr(InStrPosition, GrabRangeArray(i, 1), "abcdef", vbBinaryCompare)
        If InStrPosition <> 0 Then
            SubStringCounter = SubStringCounter + 1
            InStrPosition = InStrPosition + 6
        End If
    Loop Until InStrPosition = 0
Next i

Debug.Print "Runtime: " & Timer - RunTime & ", ""abcdef"" occurences: " & SubStringCounter
End Sub

here's a fast way to test if a substring exists, but does not return the number of substring occurrences.

Option Explicit
Option Compare Binary
Option Base 0
DefLng A-Z
Sub TestSubStringOccurence()
Dim GrabRangeArray() As Variant
Dim I As Long
Dim L As Long
Dim RunTime As Double
Dim SubStringCounter As Long
Dim J As Long
Dim InStrPosition As Long
Dim Ws As Excel.Worksheet
Const ConstABCDEFString As String = "abcdef"
Dim B As Boolean

Set Ws = ThisWorkbook.Sheets("Sheet1")

RunTime = Timer

ReDim GrabRangeArray(0 To 49999)
With Ws
For I = 1 To 50000
    If I Mod 2 = 0 Then GrabRangeArray(I - 1) = "1 abcdef 2 abcdef 3 abcdef 4 abcdef 5 abcdef" _
    Else GrabRangeArray(I - 1) = I - 1
Next I

.Range("a1:a50000").Value = Application.Transpose(GrabRangeArray)

End With

RunTime = Timer

For I = 1 To UBound(GrabRangeArray, 1)
    If InStrB(1, GrabRangeArray(I), ConstABCDEFString, vbBinaryCompare) Then _
    SubStringCounter = SubStringCounter + 1
Next I

Debug.Print "Runtime: " & Timer - RunTime & ", ""abcdef"" occurences: " & SubStringCounter    
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
JoeB
  • 297
  • 4
  • 20
0

Well i used Joins and Splits, didn't do any benchmark though:

Function IndexOf(ByRef arr() As String, ByVal str As String) As Integer
    Dim joinedStr As String
    Dim strIndex As Integer
    joinedStr = "|" & Join(arr, "|")
    strIndex = InStr(1, joinedStr, str)
    If strIndex = 0 Then
        IndexOf = -1
        Exit Function
    End If
    joinedStr = Mid(joinedStr, 1, strIndex - 1)
    IndexOf = UBound(Split(joinedStr, "|")) - 1
End Function
John Pangilinan
  • 953
  • 1
  • 8
  • 25
0

An improved version of John's code (if you search a string, it finds the first occourence even though the string is not completely what you are searching, ex: you search "and", your array is "me and you","just","and" it returns 1 instead of 3)

Function IndexOf(ByRef arr() As String, ByVal str As String) As Integer
Dim tuttook As Boolean
Dim joinedStr As String
Dim strIndex As Integer
strIndex = 0
tuttook = False
    joinedStr = "|" & Join(arr, "|")
    While tuttook = False
        strIndex = InStr(strIndex + 1, joinedStr, str)
        If strIndex = 0 Then
            IndexOf = -1
            Exit Function
        Else
            If Mid(joinedStr, strIndex - 1, 1) = "|" And Mid(joinedStr, strIndex + Len(str), 1) = "|" Then tuttook = True
        End If
    Wend
    joinedStr = Mid(joinedStr, 1, strIndex - 1)
    IndexOf = UBound(Split(joinedStr, "|")) - 1
End Function