5

I currently have code to allow me to look through the rows with matching ID from Sheet 1 and Sheet 2. When both IDs match, Sheet 2 information will be pasted to the Sheet 1 rows with the same IDs. My code works on less than 1,000 rows and when I tested it gave results within a minute.

The problem is that when I tried to run it for 1,000,000 rows it keeps running and for more than 20 minutes and never stop running since then. I hope anyone could assist me in making changes to the code to allow me to do a loop and copy paste the information from Sheet 2 to Sheet 1 for 200,000 rows.

Sub Sample()


  Dim tracker As Worksheet
    Dim master As Worksheet
    Dim cell As Range
    Dim cellFound As Range
    Dim OutPut As Long

   Set tracker = Workbooks("test.xlsm").Sheets("Sheet1")
   Set master = Workbooks("test.xlsm").Sheets("Sheet2")

   Application.ScreenUpdating = False
    For Each cell In master.Range("A2:A200000")

        Set cellFound = tracker.Range("A5:A43000").Find(What:=cell.Value, LookIn:=xlValues, LookAt:=xlWhole)
        If Not cellFound Is Nothing Then
      matching value

            cellFound.Offset(ColumnOffset:=1).Value2 = cell.Offset(ColumnOffset:=2).Value2


        Else

        End If
        Set cellFound = Nothing
        Debug.Print cell.Address
    Next
    Application.ScreenUpdating = True
    OutPut = MsgBox("Update over!", vbOKOnly, "Update Status")


End Sub

Above is the code that I have for now.

pnuts
  • 58,317
  • 11
  • 87
  • 139
nabilah
  • 53
  • 1
  • 8
  • 1
    For starters, writing 200,000 cells addresses to the VBE's Immediate window via `Debug.Print cell.Address` is going to negatively impact performance. Actually, writing 200,000 of *anything* to *anywhere* when it is not important is going to negatively impact performance. –  Sep 09 '15 at 04:39
  • 1
    Load the values from the tracker sheet into a dictionary object, with the values as keys and the row numbers as values. Read the whole A2:B200000 range into a variant array and loop over that, checking the dictionary for matches: when you find a match copy the value from the second "column" of the array to the tracker sheet at the row you found from the dictionary object. – Tim Williams Sep 09 '15 at 04:51
  • 1
    Similar question: http://stackoverflow.com/questions/24998958/excel-macro-to-compare-and-copy-data-from-one-sheet-to-another-taking-a-long-tim/24999397#24999397 – Tim Williams Sep 09 '15 at 04:55
  • Same approach as @TimWilliams, except I'd copy both sheets to arrays – paul bica Sep 09 '15 at 05:18
  • To put the numbers from your narrative into perspective, you say that you can run 1000 values through in 'less than a minute'. Round that up to a minute. 1,000,000 rows is 1000² so even without accounting for performance degradation on the larger data set, that means that 1,000,000 rows will take 1000 minutes or 16 hours and 40 minutes. –  Sep 09 '15 at 05:59
  • Do you mean i should remove the Debug.Printcell.Address? @Jeeped – nabilah Sep 09 '15 at 06:10
  • Thanks i will take a look at the similar question which you have given :) @TimWilliams – nabilah Sep 09 '15 at 06:11
  • What do you mean copy it to arrays? @paulbica – nabilah Sep 09 '15 at 06:11
  • That would a start. I'm working on a revised method for you involving a combination of variant arrays and a scripting dictionary that should improve performance. Testing is taking a while. –  Sep 09 '15 at 06:19
  • Alright thank you! :) @Jeeped – nabilah Sep 09 '15 at 06:29

3 Answers3

6

Incorporating @paulbica's suggestion, this ran in several seconds for me.

Sub Sample()

    Dim rngTracker As Range
    Dim rngMaster As Range
    Dim arrT, arrM
    Dim dict As Object, r As Long, tmp

    With Workbooks("test.xlsm")
        Set rngTracker = .Sheets("Tracker").Range("A2:B43000")
        Set rngMaster = .Sheets("Master").Range("A2:C200000")
    End With

    'get values in arrays
    arrT = rngTracker.Value
    arrM = rngMaster.Value

    'load the dictionary
    Set dict = CreateObject("scripting.dictionary")
    For r = 1 To UBound(arrT, 1)
        dict(arrT(r, 1)) = r
    Next r

    'map between the two arrays using the dictionary
    For r = 1 To UBound(arrM, 1)
        tmp = arrM(r, 1)
        If dict.exists(tmp) Then
            arrT(dict(tmp), 2) = arrM(r, 3)
        End If
    Next r

    rngTracker.Value = arrT

End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • 1
    Hi i tried the code you gave however it does not work when Sheet 1 is in a pivot table, is there any additional code i would need to do in order for the code to work on pivot table? @TimWilliams – nabilah Sep 09 '15 at 07:38
  • 1
    @nabilah. What! Sheet1 (tracker) data cannot be updated if it is in a pivot table! You might be able to update the range that the pivot table is based on to have the value looked up from sheet2 (master) and then change the definition of the pivot table to display this new value. – HarveyFrench Sep 09 '15 at 08:53
  • 1
    Thanks for this @Tim, I learn a lot. I posted an answer too (a coy of yours) which a couple of points and slightly easier to understand variable names. (I found replacing dict with dictTracker helped me to understand it more easily.) Nice code. – HarveyFrench Sep 09 '15 at 10:07
  • 1
    You even included a tmp for efficiency ! – paul bica Sep 09 '15 at 12:32
  • Sorry I dont get what you are trying to say. Do you mean i have to remove the pivot table and run the code and after that do a pivot table again? @HarveyFrench – nabilah Sep 10 '15 at 02:20
2

You could use the index of a Dictionary object and use its native indexing properties to perform the lokups. I'm not sure just how well that will perform in a data set of 200K records where a high report of failure was going to occur and you are showing at least a 78% failure rate (200K records to match and update 43K records).

Sub Sample3()
    Dim tracker As Worksheet, master As Worksheet
    Dim OutPut As Long
    Dim v As Long, p As Long, vMASTER As Variant, vTRACKER As Variant, dMASTER As Object

    Set tracker = Workbooks("test.xlsm").Sheets("Sheet1")
    Set master = Workbooks("test.xlsm").Sheets("Sheet2")
    Set dMASTER = CreateObject("Scripting.Dictionary")

    Debug.Print Timer
    'Application.ScreenUpdating = False '<~~no real need to do this if working in memory

    With tracker
        vTRACKER = .Range(.Cells(5, 2), .Cells(Rows.Count, 1).End(xlUp)).Value2
    End With

    With master
        vMASTER = .Range(.Cells(2, 1), .Cells(Rows.Count, 3).End(xlUp)).Value2
        For v = LBound(vMASTER, 1) To UBound(vMASTER, 1)
            If Not dMASTER.exists(vMASTER(v, 1)) Then _
                dMASTER.Add Key:=vMASTER(v, 1), Item:=vMASTER(v, 3)
        Next v
    End With

    For v = LBound(vTRACKER, 1) To UBound(vTRACKER, 1)
        If dMASTER.exists(vTRACKER(v, 1)) Then _
            vTRACKER(v, 2) = dMASTER.Item(vTRACKER(v, 1))
    Next v

    With ThisWorkbook.Sheets("Sheet1")  'tracker
        .Cells(5, 1).Resize(UBound(vTRACKER, 1), 2) = vTRACKER
    End With

    'Application.ScreenUpdating = True '<~~no real need to do this if working in memory
    Debug.Print Timer
    OutPut = MsgBox("Update over!", vbOKOnly, "Update Status")

    dMASTER.RemoveAll: Set dMASTER = Nothing
    Set tracker = Nothing
    Set master = Nothing

End Sub

Once both ranges are mirrored into variant arrays, a dictionary is created in order to fully utilize its indexing properties for identification.

The above shows about a significant increase in efficiency over 200K records in master vs 43K records in tracker.

btw, I did use an .XLSB for this; not an .XLSM.

  • 1
    Alright thank you! Really appreciate your help :) @Jeeped – nabilah Sep 09 '15 at 07:13
  • 1
    If you have time, please offer up some speed trial times of before and after for the benefit of others (after both offerings of course). I'm very interested in how the two responses worked on actual data vs. the randomized sample data I created. –  Sep 09 '15 at 07:15
  • 1
    Alright i will try :). However do you know any additional code that i would need to use if the rows are in the pivot table? – nabilah Sep 09 '15 at 07:50
  • 1
    tbh, I really didn't take a pivot table into consideration while writing the above code. That might have been something to mention. –  Sep 09 '15 at 07:51
  • Its Okay, but most codes does not work on the pivot table unless some additional code is added. But i am unsure of what code that is needed to add in @Jeeped – nabilah Sep 09 '15 at 08:00
  • 2
    @Jeepad Typo error `Set master = Workbooks("test.xlsm").Sheets("Sheet2)` to be as `Set master = Workbooks("test.xlsm").Sheets("Sheet2")` – skkakkar Sep 09 '15 at 09:50
  • ++ great implementation! I think populating the dictionary might be the slowest part, but I'm curious to see the difference before and after – paul bica Sep 09 '15 at 12:21
  • Add a reference to **Microsoft Scripting Runtime** (**Tools -> References...**). Then you can write `Dim dMaster As New Scripting.Dictionary` instead of `Dim dMaster As Object: Set dMaster = CreateObject("Scripting.Dictionary")`. – Zev Spitz Sep 09 '15 at 21:48
  • @ZevSpitz - My code was specifically written using the [CreateObject function](https://msdn.microsoft.com/en-us/library/office/gg264813.aspx) as I wished to avoid situations like [this](http://stackoverflow.com/questions/29384575/combine-rows-sum-values-in-a-worksheet/29386035#29386035) or [this](http://stackoverflow.com/questions/31891059/extracting-the-collection-of-unique-values-from-a-filter-in-vba/31891475#31891475). –  Sep 09 '15 at 21:57
  • _situations like this_: That someone might misread **Microsoft Scripting Runtime** as **Windows Scripting Runtime**? (I'm not sure what is the second situation you are referring to.) My own feeling is that it provides Intellisense to users who are unfamiliar with the Dictionary, and is worth a little explanation; but I can agree to disagree. – Zev Spitz Sep 09 '15 at 22:10
  • Is there any particular reason to write `dMASTER.Add Key:=vMASTER(v, 1), Item:=vMASTER(v, 3)` instead of `dMASTER(vMASTER(v, 1)) = vMASTER(v, 3)`? – Zev Spitz Sep 09 '15 at 22:16
  • And `vTRACKER(v, 2) = dMASTER.Item(vTRACKER(v, 1))` instead of `vTRACKER(v, 2) = dMASTER(vTRACKER(v, 1))`? – Zev Spitz Sep 09 '15 at 22:18
  • @ZevSpitz - Beyond a verbose method of building (and extracting) the dictionary, no there is not. I used the `.Exists` method instead of a forced write (and possible overwrite) because I felt it explained the process a little better. Same for extracting the `.Item` from a match. –  Sep 09 '15 at 22:21
2

It might also be faster to use ADODB.

Dim filepath As String
Dim conn As New ADODB.Connection
Dim sql As String

filepath = "c:\path\to\excel\file\book.xlsx"

With conn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=""" & filepath & """;" & _
        "Extended Properties=""Excel 12.0;HDR=No"""

    sql = _
        "UPDATE [Sheet1$A2:B200000] AS master " & _
        "INNER JOIN [Sheet2$] AS tracker ON master.F1 = tracker.F1 " & _
        "SET master.F2 = tracker.F2"
    .Execute sql
End With

This works with Office 2007. Office 2010 (I haven't tested on 2013) has a security measure that prevents updating Excel spreadsheets with an SQL statement. In this case you can either use the old Jet provider, which doesn't have this security measure. This provider doesn't support .xlsx, .xlsm or .xlsb files; only .xls.

With conn
    .Provider = "Microsoft.Jet.OLEDB.4.0"
    .ConnectionString = "Data Source=""" & filepath & """;" & _
        "Extended Properties=""Excel 8.0;HDR=No"""

Alternatively, you can read the resulting data into a disconnected recordset and paste the recordset into the original worksheet:

Dim filepath As String
Dim conn As New ADODB.Connection
Dim sql As String
Dim rs As New ADODB.Recordset

filepath = "c:\path\to\excel\file\book.xlsx"

With conn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=""" & filepath & """;" & _
        "Extended Properties=""Excel 12.0;HDR=No"""

    sql = _
        "SELECT master.F1, IIF(tracker.F1 Is Not Null, tracker.F2, master.F2) " & _
        "FROM [Sheet1$A2:B200000] AS master " & _
        "LEFT JOIN [Sheet2$] AS tracker ON master.F1 = tracker.F1 "

    rs.CursorLocation = adUseClient
    rs.Open sql, conn, adOpenForwardOnly, adLockReadOnly
    conn.Close
End With

Workbooks.Open(filepath).Sheets("Sheet1").Cells(2, 1).CopyFromRecordset rs

If using CopyFromRecordset, keep in mind that there is no guarantee of the order in which the records are returned, which might be a problem if there is other data in the master worksheet besides columns A and B. To resolve this, you can include those other columns in the recordset as well. Alternatively, you can enforce the order of the records using an ORDER BY clause, and sort the data in the worksheet before you begin.

Community
  • 1
  • 1
Zev Spitz
  • 13,950
  • 6
  • 64
  • 136