1

I am looking for the quickest way to look up the value in worksheet data and give the corresponding value in another column. The lookup must be done in VBA. Only 1 lookup is to be performed (no more lookups are to be done on the same dataset).

For instance, we have data in this form:
VBA VLOOKUP performance

Using VBA, what is the fastest way to find a value in the column B which corresponds to the value "key990000" in the column A?

ZygD
  • 22,092
  • 39
  • 79
  • 102
  • Your findings seem to be at odds with [this similar question](https://stackoverflow.com/questions/18754096/matching-values-in-string-array). Might be worth reviewing both :D Cheers. – David Zemens Jan 18 '18 at 16:08
  • 1
    @DavidZemens Thank you. It is all ok :) I have included one more function and redone all the tests. In the referenced question, `match` is done for array, while on my tests previously I only had it on worksheet data directly. Now, `match_in_array` is the `match` where an array is passed as an argument to look in. This is slower than looping in array (`loop_in_array`). But `WorksheetFunction.Match()`, when you pass a range to it, is faster than looping through an array. – ZygD Jan 19 '18 at 14:16

2 Answers2

4

I have tested several different functions in contexts of sorted and unsorted data, 100k and 1 million rows of data.

The fastest way is to use WorksheetFunction.Vlookup or a combination of WorksheetFunction.Index and WorksheetFunction.Match. But in case 2 or more searches are to be done on the same column, then it's best to load data into array (which takes relatively more time) and loop through it (looping through a loaded array is extremely quick).

The summary results of performance tests (having both, 100 000 and 1 million rows of data)

                | 100k rows   | 1m rows     |
---------------------------------------------
 Sub            | sort | uns  | sort | uns  |
---------------------------------------------
 WsF_vlookup    | 0.05 | 0.05 | 0.25 | 0.38 |
 WsF_idx_match  | 0.05 | 0.05 | 0.25 | 0.38 |
 loop_in_array  | 0.06 | 0.06 | 0.35 | 0.43 | - this is better for 2+ lookups
 range_find     | 0.10 | 0.12 | 0.80 | 0.95 |
 match_in_array | 0.11 | 0.11 | 0.65 | 0.80 |
 loop_in_sheet  | 0.14 | 0.16 | 1.2  | 1.39 |
 array_to_dict  | 0.5  | 0.65 | 61   | 87   |
 sheet_to_dict  | 1.5  | 1.70 | 75   | 100  |
---------------------------------------------

Used subroutines

Sub WsF_vlookup()
  Dim timer0 As Single

  timer0 = Timer()
  Debug.Print Application.WorksheetFunction.VLookup("key990000", ThisWorkbook.Worksheets("Sheet1").Range("A1:B1000000"), 2, 0)
  Debug.Print Timer - timer0

End Sub

Sub WsF_idx_match()
  Dim timer0 As Single
  Dim rw As Long

  timer0 = Timer()
  rw = Application.WorksheetFunction.Match("key990000", ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000"), 0)
  Debug.Print Application.WorksheetFunction.Index(ThisWorkbook.Worksheets("Sheet1").Range("B1:B1000000"), rw)
  'no difference from:
  'Debug.Print ThisWorkbook.Worksheets("Sheet1").Cells(rw, 2)
  Debug.Print Timer - timer0

End Sub

Sub loop_in_array()
  Dim timer0 As Single
  Dim myArray1() As Variant
  Dim i As Long

  timer0 = Timer()

  'Reading rows takes the majority of time
  myArray1 = ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000").Value

  'For 1m unsorted rows the following part takes only 0.06s when the key is near the end
  For i = 1 To UBound(myArray1, 1)
    If myArray1(i, 1) = "key990000" Then
      Debug.Print ThisWorkbook.Worksheets("Sheet1").Cells(i, 2).Value
      Exit For
    End If
  Next

  Debug.Print Timer - timer0

End Sub

Sub range_find()
  Dim timer0 As Single
  Dim rngFound As Range

  timer0 = Timer()

  Set rngFound = ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000").Find("key990000", , xlValues, xlWhole)

  Debug.Print rngFound.Offset(0, 1).Value
  Debug.Print Timer - timer0

End Sub

Sub match_in_array()
  Dim timer0 As Single
  Dim myArray1() As Variant
  Dim lngRow As Long

  timer0 = Timer()

  'Reading rows takes half of the time
  myArray1 = ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000").Value

  'For 1m unsorted rows the following part takes 0.45s when the key is near the end
  lngRow = Application.WorksheetFunction.Match("key990000", myArray1, 0)
  Debug.Print ThisWorkbook.Worksheets("Sheet1").Cells(lngRow, 2)
  Debug.Print Timer - timer0

End Sub

Sub loop_in_sheet()
  Dim timer0 As Single
  Dim i As Long
  Dim cell As Range

  timer0 = Timer()

  For Each cell In ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000")
    If cell.Value = "key990000" Then
      Debug.Print ThisWorkbook.Worksheets("Sheet1").Range("B" & cell.Row).Value
      Exit For
    End If
  Next

  Debug.Print Timer - timer0

End Sub

Sub array_to_dict()
  Dim timer0 As Single
  Dim myArray1() As Variant
  Dim dict As Object
  Dim i As Long

  timer0 = Timer()

  myArray1 = ThisWorkbook.Worksheets("Sheet1").Range("A1:B1000000").Value

  Set dict = CreateObject("Scripting.Dictionary")
  For i = 1 To UBound(myArray1, 1)
    dict(myArray1(i, 1)) = myArray1(i, 2)
  Next

  Debug.Print dict("key990000")
  Debug.Print Timer - timer0

  Set dict = Nothing
End Sub

Sub sheet_to_dict()
  Dim timer0 As Single
  Dim dict As Object
  Dim cell As Range

  timer0 = Timer()

  Set dict = CreateObject("Scripting.Dictionary")
  For Each cell In ThisWorkbook.Worksheets("Sheet1").Range("A1:A1000000")
    dict(cell.Value) = ThisWorkbook.Worksheets("Sheet1").Range("B" & cell.Row).Value
  Next

  Debug.Print dict("key990000")
  Debug.Print Timer - timer0

  Set dict = Nothing
End Sub
ZygD
  • 22,092
  • 39
  • 79
  • 102
  • Good post. I did a similar test in the past. I think what you need to consider is the overhead in loading the entries into a dictionary. To be more specific, if you are only looking up a single value, then the overhead of loading the entire worksheet into a dictionary makes it less viable. It may be beyond the scope of your post, but consider a case where you're going to performing multiple lookups against the same range. Then, you only load into a dictionary once and perform lookups against that dictionary. You will see a sizable performance gain. – basodre Jan 18 '18 at 13:44
  • Another question: did you run this as a "worst case" scenario? I'm not sure how VBA performs each of these functions, but did you make it so that the lookup value ("key990000") was the last entry in the lookup range? That might give a better idea of how each of these routines works. – basodre Jan 18 '18 at 13:48
  • @user3561813 - Yes, I did make sure "key990000" was far in the bottom of the data, both, when sorted and unsorted. Also, you are correct about the overhead of loading data into the dictionary. This test assumes the need to do only 1 or a few more of lookups for the same dataset. I may do another test to see when it is worth using a dictionary. In VBA, so far my only use of dictionaries was to create a list of unique entries. – ZygD Jan 18 '18 at 15:52
  • @ZygD I think you'll see a decent boost in performance. For example, I ran 1 million lookups against a 10,000 record range using both a Dictionary and VBA Vlookup. The results: `1000K lookups take: 3.820313 seconds to complete with Dictionary against a lookup range of 10000 records!` vs `1000K lookups take: 95.65234 seconds to complete with Vlookup against a lookup range of 10000 records!` – basodre Jan 18 '18 at 16:13
  • @user3561813 This is a great observation! I have appended one more sentence in the question which says that this test aims to test the performance of only 1 lookup on the same range. – ZygD Jan 18 '18 at 16:23
  • Did you realize that you used two array assignments in the `loop_in_array()` and `match_in_array()` tests, but only one in `array_to_dict()`. Didn't make much of a difference, just inconsistent. Also, you don't specifically need to read Column B into an array at all, you just need to return that one value (runs in half the time). Just copying the values into the variant array is over 80% of the time. Not sure how you got the results you did; On my computer `WsF_vlookup` ran in **1/7th** the time that `loop_in_array` did (***1/4th*** if I only copied Column A). – Profex Oct 02 '18 at 01:44
  • @Profex - Thank you for an excellent remark! I have updated the results after retesting on the same machine. Now it's very clear that if 2+ searches are needed on the same dataset, it's best to loop through array (at least on my machine). – ZygD Oct 02 '18 at 13:41
0

In a similar case my code has to search for the row of a (unique) string in a column with ~20000 items multiple times. I tried loop_in_array, WsF_idx_match and get_row_from_collection.
Here the results for 10000 searches of that string which sits in the last row of the column:

  • string found by loop_in_array in 47125ms; row =>19815<

  • string found by match in sheet (WsF_idx_match) in 13015ms; row =>19815<

  • string found by search in collection in 47ms; row =>19815<

      On Error Resume Next
      For idx = 1 To loopCnt
          myRow = 0                           ' = not found
          myRow = collFIDs(strID)
      Next idx
    

    collFIDs is create this way:

      Function buildColl_Feats_Rows() As Collection
      Dim collRet             As Collection
      Dim rowFID              As Long
      Dim lastRow             As Long
      Dim strFID              As String
    
          lastRow = ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).row
          Set collRet = New Collection
    
          On Error Resume Next
    
          For rowFID = 1 To lastRow
              strFID = ThisWorkbook.Worksheets("Sheet1").Cells(rowFID, 1).Value
              collRet.add CVar(rowFID), strFID
          Next rowFID
    
          Set buildColl_Feats_Rows = collRet
      End Function
    

    ...which takes <200ms (for <20000 rows in sheet)

Joe Phi
  • 340
  • 1
  • 4
  • 14