1

I have a problem with filtering and adding data to collections in my Excel VBA code.

I have two workbooks: SourceWB and SourceTR. I gather data from both and list them in the SourceTR. The goal is to compare both data sets and find mismatches. The code is run when the SourceTR is active.

I've omitted the rest of the code, here is just the problematic part:

Debug.Print "3 -- " & Now
For Each i In Workbooks("SourceTR").Worksheets("Source1").Range("A4:A10000")
    If i.Value <> "" Then
        If month(i.Value) = selected_month Then
            item_1 = Worksheets("Source1").Range("E" & i.row).Value
            item_2 = Worksheets("Source1").Range("F" & i.row).Value
            item_3 = Worksheets("Source1").Range("K" & i.row).Value
            entry = item_1 & "_" & item_2 & "_" & item_3
            If IsInCollection(init_tr_entries, entry) = False Then
                init_tr_entries.Add (entry)
            End If
        End If
    End If
Next i

Debug.Print "4 -- " & Now

Dim coll_item
For Each coll_item In init_tr_entries
    Workbooks("SourceTR").Worksheets("target").Range("A" & starting_row_1).Value = Split(coll_item, "_")(0)
    Workbooks("SourceTR").Worksheets("target").Range("B" & starting_row_1).Value = Split(coll_item, "_")(1)
    Workbooks("SourceTR").Worksheets("target").Range("C" & starting_row_1).Value = Split(coll_item, "_")(2)
    starting_row_1 = starting_row_1 + 1
Next coll_item

Debug.Print "5 -- " & Now

Dim a As Range
Dim user As String
user = Worksheets("vir").Range("G2").Value

Dim init_as_entries As New Collection

For Each a In Workbooks("SourceWB"))).Worksheets("Source2")).Range("BU4:BU10000")
    If a.Value <> "" Then
        If a.Value = "" & selected_month & "" Then
            If Workbooks("SourceWB"))).Worksheets("Source2")).Range("F" & a.row).Value = user Then
                item_1 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("A" & a.row).Value
                item_2 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("B" & a.row).Value
                item_3 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("E" & a.row).Value
                entry = item_1 & "_" & item_2 & "_" & item_3
                init_as_entries.Add (entry)
            End If
        End If
    End If
Next a

For Each coll_item In init_as_entries
    Workbooks("SourceTR").Worksheets("target").Range("F" & starting_row_2).Value = Split(coll_item, "_")(0)
    Workbooks("SourceTR").Worksheets("target").Range("G" & starting_row_2).Value = Split(coll_item, "_")(1)
    Workbooks("SourceTR").Worksheets("target").Range("H" & starting_row_2).Value = Split(coll_item, "_")(2)
    starting_row_2 = starting_row_2 + 1
Next coll_item

Debug.Print "6 -- " & Now

The code between point 3 and 5 takes about 1 second and the code between 5 and 6 takes about 10 seconds. However, other that some filtering I don't see any difference in the code.

The data sets are small, 2500 non blank rows in SourceWB and only 60 in SourceTR.

What am I doing wrong?

---EDIT--- I've done some additional measurements and this part:

For Each a In Workbooks("SourceWB"))).Worksheets("Source2")).Range("BU4:BU10000")
    If a.Value <> "" Then
        If a.Value = "" & selected_month & "" Then
            If Workbooks("SourceWB"))).Worksheets("Source2")).Range("F" & a.row).Value = user Then
                item_1 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("A" & a.row).Value
                item_2 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("B" & a.row).Value
                item_3 = Workbooks("SourceWB"))).Worksheets("Source2")).Range("E" & a.row).Value
                entry = item_1 & "_" & item_2 & "_" & item_3
                init_as_entries.Add (entry)
            End If
        End If
    End If
Next a

takes 7 seconds with this speed enhancements:

Application.ScreenUpdating = False
Application.Calculation = xlManual
Application.EnableEvents = False
Application.DisplayStatusBar = False
ActiveSheet.DisplayPageBreaks = False
Sebastjan Hribar
  • 396
  • 3
  • 13
  • You can try using arrays to read/write in one go rather than row per row (explained here)[https://stackoverflow.com/questions/37620337/is-there-anyway-to-write-a-collection-straight-into-a-worksheet] . Apart from that, I don't see a monumental difference other than the amount of non-empty rows. However, you first check `If a.Value <> "" Then` but immediately in that if you check `If a.Value = ""...` how will that ever be true? I'm rather amazed it takes 10 seconds for an empty collection :P – Notus_Panda Apr 25 '23 at 12:38
  • I'm using collections this way to just grab the rows for a selected month and user. There might be empty rows in there as well, hence the check `<> ""` and only grab data from filled rows. – Sebastjan Hribar Apr 25 '23 at 13:30
  • I forgot to note that I already switch off calculations and screen updating. Without that it takes even longer. – Sebastjan Hribar Apr 25 '23 at 13:36
  • Finally, I've implemented the collection to array to worksheet as suggested in the above link and it doesn't decrease the time. Like you see in the edit it's the traversing, filtering and adding to a collection that slows it down. I have 1 filter less in the other loop, that is the only difference. – Sebastjan Hribar Apr 25 '23 at 13:53
  • I'm getting closer to the problem. I've copied all the data from SourceWB to a new unformatted workbook (i.e. paste values) and change the reference to this workbook in the macro and the entire procedure now takes 3 seconds!!! The original SourceWB has quite some color formatting and formulas. None that I collect though. It's not shared and not protected. What could be the root cause? – Sebastjan Hribar Apr 25 '23 at 14:14
  • The root cause to it working faster in a workbook where there isn't any formatting done? The same as when you compare doing the reading to array -> calculations/changes in array(s) -> array back to workbook. You're not needing as much to read non formatted cells as you do with your original wb, which is why FunThomas and I mentioned arrays since there you only read from excel once and work in the VBE 'till you're ready to write to Excel again. Especially conditional formatting can cause increased processing time for what I've encountered. – Notus_Panda Apr 25 '23 at 14:32

3 Answers3

4

To cleanup your code (this is basically not a speed issue), use Workbook and Worksheet variables:

Dim sourceWB As Workbook, sourceWs1 As Worksheet, sourceWS2 As Worksheet
Set sourceWB = Workbooks("SourceTR")
Set sourceWS1 = sourceWB.Worksheets("Source1")
Set sourceWS2 = sourceWB.Worksheets("Source2")

and similarly for the target Workbook/sheet.

Now for the speeding up: What VBA is slowing down is the interface between Excel and VBA. Therefore, the best you can do is to read all relevant data into an array and loop over that. Also, you should check what really is your last row of data. Read Find last used cell in Excel VBA for an in-detail discussion, I will use the most common method as example, check if that fits for you. The following code mimics the logic of step 3:

Dim lastRow As Long
lastRow = sourceWs1.Cells(sourceWs1.Rows.Count, "A").End(xlUp).Row
Dim sourceData As Variant
sourceData = sourceWs1.Range("A4:K" & lastRow)

Now you have a 2-dimensional array with a copy of the data of the first source sheet. And even if this is a rather large array, reading is not much slower than read only a single cell of data.

You can loop over the data in memory (which happens instantly, you will not be able to measure execution time)

Dim row As Long
For row = 1 To UBound(sourceData, 1)
    If sourceData(row, 1) <> "" Then
        If Month(sourceData(row, 1)) = selected_month Then
            Dim item_1, item_2, item_3, entry
            item_1 = sourceData(row, 5)   ' Col E
            item_2 = sourceData(row, 6)   ' Col F
            item_3 = sourceData(row, 11)  ' Col K
            entry = item_1 & "_" & item_2 & "_" & item_3

            If Not IsInCollection(init_tr_entries, entry) Then
                init_tr_entries.Add (entry)
           End If
        End If
    End If
Next row

Same is true for writing the data: Prepare an array with the data you want to write

ReDim targetData(1 To init_tr_entries.Count, 1 To 3)
Dim coll_Items() As String
For row = 1 To init_tr_entries.Count
    coll_Items = Split(init_tr_entries(row), "_")
    targetData(row, 1) = coll_Items(0)  ' You could also use a loop for that
    targetData(row, 2) = coll_Items(1)
    targetData(row, 3) = coll_Items(2)
Next

Now you have a 2-dimensional array of data that you can write in one go. I use an intermediate range variable, but this is just for better readability:

Dim targetRange As Range
Set targetRange = TargetWS.Range("A" & starting_row_1)
targetRange.Resize(UBound(targetData, 1), UBound(targetData, 2)).Value = targetData
FunThomas
  • 23,043
  • 3
  • 18
  • 34
0

I've finally pinpointed the root cause. I got there by using a fresh workbook as stated above. However, it's not the formatting and formulas, but the workbook name. I've referenced the workbook by its name in the code, whereas previously I used a lookup function to get the source workbook name.

The excel solution I'm trying to solve is a part of a larger set of workbooks, modules and functions and they are available (macros and workbooks) in several languages. Consequently, the workbook names change. I therefore call Workbooks(replaceYearInStr(getTranslation("some_wb_codename"))) to end up with Workbooks("SourceWB"). Depending on the backend language set one might get Workbooks("IzvorWB") for Slovenian.

The getTranslation function uses lookup to get the name from a language worksheet so it seems this one slows it down. I still don't want to hardcode the names, but at least I know what the problem is.

Sebastjan Hribar
  • 396
  • 3
  • 13
  • If `getTranslation` is slow you can "cache" the return value for any given input using (for example) a `Static` scripting dictionary within the function. – Tim Williams Apr 25 '23 at 15:55
  • The root cause was my stupidity, because I called `getTranslation` within the loop instead of assigning the value to a variable before the loop begins. Apologies and thank you all for helping me. I've fixed this and I'm down to 1.5 seconds. – Sebastjan Hribar Apr 25 '23 at 16:44
-1

Have you tried using Scripting.Dictionary objects instead of Collections? I have anecdotal evidence that they are significantly (maybe not orders of magnitude, but noticeably) faster than Collections in VBA. Use CreateObject("Scripting.Dictionary") to create the object(s) without having to set a Reference to the Microsoft Scripting Runtime, and assign it to a generic Object.

RobBaker
  • 137
  • 11
  • I doubt that this will speed up the process significantly. The time consuming part the IO between Excel and VBA. Therefore it is much more important to read all the data from the worksheet into an array and loop over that. Same for the result, the data should be collected in an array and write that back to the sheet all at once – FunThomas Apr 25 '23 at 13:29
  • Good point, well made - I usually read all of the (relevant) data on a sheet into a Variant and process from there, and then use a 2D array (of whatever type suits my data) to write back out when I'm done. – RobBaker Apr 26 '23 at 15:44