2

I have been given a work task where im to find and replace 8 digits numbers with a corresponding new values coming from a 2 column table....basically a vlookup then replace the old value with a new one...

The challenge im facing is.... the 2 column table is 882k rows, and the cells im trying to replace is about 120 million (41,000 rows x 3000 columns)...

I tried running a vba code i found somewhere...

Option Explicit

Sub Replace_Overwrite()
Dim LRow As Long, i As Long
Dim varSearch As Variant

With Sheets("Sheet2")
    LRow = .Cells(.Rows.Count, 1).End(xlUp).Row
    varSearch = .Range("A2:B" & LRow)
End With

With Sheets("Sheet1").UsedRange
    For i = LBound(varSearch) To UBound(varSearch)
        .Replace what:=varSearch(i, 1), replacement:=varSearch(i, 2), lookat:=xlWhole
    Next
End With
    
End Sub

I tried using this and it ran it for 8 hours and my work laptop crashed.... I'm not sure anymore if this is still possible with MS Excel alone...

I wonder if anyone can help me with a code that can process it.. i can leave my system open over the weekend if its stable and does work.. it only has 8GB ram btw, running excel 2013...

Fiona
  • 21
  • 1
  • 2
    Turn off calculations and screen updating and it should run much faster – Andreas May 28 '21 at 16:00
  • What is the minimum and maximum of the old numbers ? – CDP1802 May 28 '21 at 16:26
  • This code is not really safe. The danger here is that you might accidentally make multiple replacements if any of the new codes also feature in the old codes. – Joffan May 28 '21 at 16:27
  • A small example (not 3000 columns) of what a replacement might look like would help to frame the problem a little. I find it hard to believe that these codes can occur *anywhere* in the table. Also: you should possibly consider making the replacement in a way that allows continuation; so setting a flag on any rows for which replacement has been completed, for example, and saving the result every so often. – Joffan May 28 '21 at 16:34

1 Answers1

6

To speed up things, do as much as possible in memory and minimize the interaction between VBA and Excel (as this makes things really slow).

The following attempt reads the lookup-list into a dictionary and then processes the data column by column.

I did a test, creating 880.000 lookup rows and 40.000 x 100 cells of data. Building the dictionary took less than a minute, processing the columns took 3-4 seconds per column. I added a logic that after every 10 columns, the whole workbook is saved, that increased the processing time but ensures that after a crash you can more or less continue where you left (the yellow color tells you where, just replace the 1 in for col=1 with the column where you want to restart).

I have added some DoEvents, that in theory slows down the process a little bit. Advantage is that you can see the output of the debug.print and the whole Excel process is not displayed as unresponsive in the task manager.

To build the dictionary, I read the complete data into an array at once (if you are not familiar with Dictionaries: You need to add a reference to the Microsoft Scripting Runtime).

Function createDict() As Dictionary
    Dim d As New Dictionary
        
    Dim rowCount As Long
    Dim list()
    Debug.Print Now, "Read data from Lookup sheet"
    With ThisWorkbook.Sheets(1)
        rowCount = .Cells(.Rows.Count, 1).End(xlUp).row
        list = .Range("A1:B" & rowCount).Value
    End With
        
    Debug.Print Now, "Build dictionary."
    
    Dim row As Long
    For row = 1 To UBound(list)
        If Not d.Exists(list(row, 1)) Then d.Add list(row, 1), list(row, 2)
        If row Mod 1000 = 0 Then DoEvents
    Next row
    
    Set createDict = d
End Function

As said, replacing the data is done column by column. Again, I read the whole column at once into an array, do the replace on this array and then write it back to the sheet.

Sub replaceAll()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    
    Dim d As Dictionary
    Set d = createDict
    
    Dim row As Long, col As Long
    Dim rowCount As Long, colCount As Long
    With ThisWorkbook.Sheets(2)
        rowCount = .Cells(.Rows.Count, 1).End(xlUp).row
        colCount = .Cells(1, .Columns.Count).End(xlToLeft).Column
        
        For col = 1 To colCount
            Debug.Print Now & "processing col " & col
            DoEvents
            
            Dim data
            data = .Range(.Cells(1, col), .Cells(rowCount, col))
            For row = 1 To rowCount
                If d.Exists(data(row, 1)) Then data(row, 1) = d(data(row, 1))
            Next row
            .Range(.Cells(1, col), .Cells(rowCount, col)) = data
            .Cells(1, col).Interior.Color = vbYellow
            
            If col Mod 10 = 0 Then ThisWorkbook.Save
        Next
    End With
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    
End Sub

One remark: You should consider to use a database for such amount of data.

FunThomas
  • 23,043
  • 3
  • 18
  • 34
  • Nice. FYI it can be faster to create multiple dictionaries when the total number of records get too large: https://stackoverflow.com/questions/67310436/optimizing-vba-code-running-time-finding-username-of-a-range-in-another-workshee/67312023#67312023 It's a little added complexity but still typically quicker overall. – Tim Williams May 28 '21 at 17:42
  • @TimWilliams: Does it only speed up to *create* the dictionary or also the *lookup*? With other word: is 1.2m * (lookup in one large dict) significant slower than 1.2m * (lookup in 10 smaller dicts)? – FunThomas May 28 '21 at 17:47
  • In my previous testing the load time far exceeded the lookup time, so it still worked out faster. I will use you test numbers and run a compare... – Tim Williams May 28 '21 at 17:48
  • Just tried: It's indeed much faster to limit the size of the dictionaries. Using 20 dictionaries, those where build within a few seconds, so that would save some time. However, processing the data didn't change (took 4s for every column), so I would ignore the possible win of time when building the dictionary because now the code looks much more ugly... – FunThomas May 28 '21 at 18:08
  • When you created your test data did you use 8-digit numbers? I'm running tests and the dictionary creation is running incredibly slowly... – Tim Williams May 28 '21 at 18:28
  • Yeah (but I wrote the test data into a sheet first). Just used CLng(Rnd() * 10000000) – FunThomas May 28 '21 at 18:35
  • I’m using all 8-digit numbers... loading from the sheet to an array, then looping that – Tim Williams May 28 '21 at 19:12
  • Figured it out - adding keys as strings is much faster.... – Tim Williams May 28 '21 at 21:34
  • This is amazing, Thank you so much @FunThomas! I have tried it but im getting some problems. I left it running for 2 hours until it actually crashed, when i terminated excel and re-opened it only completed 10 columns.. i love the yellow highlight and auto save by the way. – Fiona May 30 '21 at 00:32
  • Do i have to to save this file on a specific file type? it is initially xlsb im trying to use xlsx right now and rerun it. – Fiona May 30 '21 at 00:43
  • As I had the Macros in the same workbook, I have used xlsm – FunThomas May 31 '21 at 07:18
  • sadly it still doesnt work for me properly after some trial and errors, im now thinking its my work laptop specs/softwares that is causing the issue :( – Fiona May 31 '21 at 09:41