1

There are a lot of questions regarding this issue. I read many of them and tried a few things but they don't fix my case.

I am trying to compare lines from two different (very long) sheets. If specific indices match then specific cells (always the same columns with the current line) need to be copied from one sheet into the other.

It looks like this just bigger (enlarged example):

Dim ArrayOne() as string
Dim ArrayTwo() as string

Redim ArrayOne (1 to AmountOfRowsSheet1)
Redim ArrayTwo (1 to AmountOfRowsSheet2)

For i = 1 to AmountOfRowsSheet1
   ArrayOne(i) = Sheet1.Cells(i, ThisColumn)
next i

For i = 1 to AmountOfRowsSheet2
   ArrayTwo(i) = Sheet2.Cells(i, ThatColumn)
next i

for i = 1 to 4600
    for j = 1 to 69000

        if ArrayOne(i) Like "*" & ArrayTwo(j) then
            Sheet1.Cells(i, 5).value = Sheet3.Cells(i,10).value
            'the line above is repeated about 20 times just with different columns
            'so it gets potentially executed 4600*69000*20 times (6348000000)
        end if
    
    next j
next i

For-loop and everything is working, it also copies correctly but after an amount of lines I run out of memory. In the TaskManager I can see my used RAM tick up every few seconds. At one point Excel displays an error that it can't handle the next copying because of a lack of resources.

I tried:

Application.CutCopyMode = False '( at restart of loop)

Creating an empty data object and putting it into the clipboard.

and a few user32.dll fixes I found.

Community
  • 1
  • 1
DasDing
  • 23
  • 4
  • 1
    More of a [mcve] would help. My guess is that the problem lies in code that you haven't posted, and that this code that you haven't posted could be improved if you read [How to avoid using Select in Excel VBA](https://stackoverflow.com/q/10714251/4996248). But, since you haven't shown some of the relevant code, this is just a guess. One thing you could look into is to stop cell by cell processing and instead read entire ranges into VBA arrays, process those arrays, and write them back to the spreadsheet, with the actual reading and writing requiring single statements rather than loops. – John Coleman Apr 15 '21 at 11:06
  • Haven't tried this, so no idea if it works - if you run your code from a master workbook, and run for xRows, then save & close both slave workbooks, does that reset the memory being used? Then you can resume the operation from xRow. [and eventually chunk it up so you would resume from n. xRows] – Amiga500 Apr 15 '21 at 11:13
  • @John - if its a large dataset, he'll have to jump through fancy hoops as arrays are limited to ~64000 entries IIRC. – Amiga500 Apr 15 '21 at 11:14
  • 1
    @Amiga500 Recent Excel doesn't have that array limit. I worked with bigger arrays. – Pᴇʜ Apr 15 '21 at 11:18
  • Really? Brilliant - is there a new upper limit or is it completely gone? – Amiga500 Apr 15 '21 at 11:24
  • @Amiga500 Limited by available memory. I recommend [Excel specifications and limits](https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3). But in 32 bit Office there is a general 2 GB memory limit. – Pᴇʜ Apr 15 '21 at 11:28
  • Hi to all of you, thanks for the help. I actually think there is no code else needed, I dont use any Selects in my code as far as i can tell (strg+f to make sure) an also for making the comparison I already use arrays to be as quick as possible. But since its 69000*4600 rows to compare an potentially wright, its just alot. So maybe a faster/less ram heavy method of copying is all i need? – DasDing Apr 15 '21 at 11:29
  • 1
    @DasDing But we think there is more code needed. As your code is no [mcve] we cannot reproduce the issue. So please at least turn the code into an example that we can run to reproduce the issue. The code you show does not produce any issues yet. • Also include if you use 64 or 32 bit Excel (note that even in 64 Windows you might have a 32 bit Excel running so make sure you check it in Excel). Further which file type are you using? I ask that because all that might influence any limits. – Pᴇʜ Apr 15 '21 at 11:31
  • @PEH - that excel spec sheet is useless - if MS think that operating on a VBA array of size 10 is the same as an array of size 70,000 they need their head examined. Try applying any worksheet.functions to the two and spot the differences. Completely undocumented by them of course. I know those problems applied in 2007 and 2013 as I hit them - and I've no reason to believe its any different in 2019 or 365 given MS's specs indicate they don't believe the former two had issues! The useful limit is still 65,536 unless someone can absolutely confirm different (i.e. no unexpected behaviour!). – Amiga500 Apr 15 '21 at 11:43
  • I enlarged the code example :) – DasDing Apr 15 '21 at 11:44
  • 1
    @DasDing you want to run a `Like` operator on `317,400,000` iterations including a `.Value = .Value` transfer? How many years did you plan this to run? Just an examle if every iteration takes only `0,0001` seconds this will run for almost 8 hours. – Pᴇʜ Apr 15 '21 at 11:47
  • + ArrayTwo(y) what happens with this y? – EvR Apr 15 '21 at 11:49
  • @Pᴇʜ unfortunatly, I guess so. If the lines match i need to pull together several cells from 2 sheets into a third one and yeah the matching needs to be checked and then the transferr Iam afraid there is no way around that – DasDing Apr 15 '21 at 11:52
  • @Amiga500 Well OP was not talking about worksheet functions. And still there is no fixed limit, but you might run into the memory limit. Of course some worksheet functions might have their own limits. I didn't say you can use as big arrays as you want. But the limit you mentioned does not exist. – Pᴇʜ Apr 15 '21 at 11:52
  • @EvR sorry typo, its supposed to be (j), i will edit it – DasDing Apr 15 '21 at 11:52
  • @DasDing Note that you can read a whole range into an array like `ArrayOne = Range("A1:B15").Value` or a column like `ArrayOne = Columns("A").Value`. No loop needed for that. – Pᴇʜ Apr 15 '21 at 11:55
  • @PEH - agree with the premise - you can indeed build an array of size limited only by memory. But since MS can't be arsed properly documenting their SW's limitations we at least shouldn't propagate the idea that there are no "calculation specifications and limits" to arrays of size >65536 elements. Otherwise, there will be some poor bollocks like me 10 years ago wondering why the fk their code isn't working when nothing anywhere online was indicating it should have problems. – Amiga500 Apr 15 '21 at 12:01

2 Answers2

1

Not an answer to the question but instead of:

For i = 1 to AmountOfRowsSheet1
   ArrayOne(i) = Sheet1.Cells(i, ThisColumn)
next i

try:

ArrayOne= Range(Cells(1, ThisColumn), Cells(AmountOfRowsSheet1, ThisColumn))

ArrayOne will be a 2D array, with data starting in (1,1) and incrementing (n,1)...

Quicker to get data and similar can be used for putting an array back into a worksheet - also miles quicker than a for loop.

Edit: Again, not direct answer to the question, but this:

import random, string

# ---------------------------------------------------------
#This part is just generating random data to compare against each other (and in case of lists 5 & 6, the data on the sheet in Sheets1(i,5) & Sheets3(i,10)
N1 = 6
list2 = []

list5 = []  #This would correspond to existing vals in Sheets(1.cells(i,5)
list6 = []  #and this to Sheets3.cells(i,10)
for i1 in range(0, 4600):
    list2.append(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N1)))
    list5.append(5)
    list6.append(10)
    
N2 = 12
list3 = []
for i1 in range(0, 69000):
    list3.append(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N2)))




list2[0] = "$$$$$$"  #Just setting two values so we can check the method works
list3[10] = "$$$$££££££"

# ---------------------------------------------------------

#This part is actually doing what your trying to do in VBA 
list4 = []
ij1 = 0
for j1 in list2:
    found = False
    for j2 in list3:
        if j1 in j2:
            found = True
            break
    if found:
        list4.append(list6[j1])
    else:
        list4.append(list5[j1])
            
    ij1 += 1

The bit your interested in runs in around 25 seconds. Absolutely no fancy code-work needed. Go look at downloading anaconda. You'd probably be quicker reading your two excel files into python, do your ops, then writing back out again than trying to do it purely in VBA.

Amiga500
  • 1,258
  • 1
  • 6
  • 11
1

I turned your example into how you would work with arrays

Option Explicit

Sub Example()
    Dim ArrayOne() As Variant
    Dim ArrayTwo() As Variant
    
    ArrayOne = Sheet1.Columns(1).Value 'read column 1 into array
    ArrayTwo = Sheet2.Columns(2).Value 'read column 2 into array
    
    Dim start
    start = Timer
    
    Dim i As Long
    For i = 1 To 4600

        Dim j As Long
        For j = 1 To 69000
            If ArrayOne(i, 1) Like "*" & ArrayTwo(j, 1) Then
                Sheet.Cells(i, 5).Value = Sheet.Cells(i, 10).Value + 1
            End If
        Next j
        
        Debug.Print i, start, Timer, "Runtime=" & Timer-start

        Stop 'we want to test time of one iteration = 23 seconds
    Next i
End Sub

This example run 23 seconds (on my computer) for one iteration of the j loop. So this will run in total 23*4600 seconds which is about 30 hours.

So either you strip down the data that needs to be processed or you use something else than Excel VBA to get it faster. Or you change your entire approach.

Note that VBA is limited to single threading. So no matter how many cores your CPU has VBA will only use one. That makes it actually a pretty bad tool for processing big data.


Actually what you need to get rid of is the read/write actions to the cells

Sheet.Cells(i, 5).Value = Sheet.Cells(i, 10).Value

Whenever you access a cell value it slows down a lot. Without that line the loop runs in 2 instead of 23 seconds (still a total runtime of 2.5 hours). So there is potential to get this faster, but probably not much faster than 2.5 hours.

If you cannot get rid of multiple read/write actions then even turning off calculation Application.Calculation = xlCalculationManual before going into the loop brings an immense boost. Just don't forget to turn it on Application.Calculation = xlCalculationAutomatic in the end. Note that turning off calculation only works if you have no formulas that need to be calculated while your loop runs (otherwise you get faulty results).

I recommend to try to improve your real code like above and check the runtime for one full run of the inner j loop as I did with the stop command. This way you can easily calculate the entire runtime by multiplication with 4600.

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
  • Exploring other code would definitely be a better option. Python and pandas would be ideal for this I'd think as you could do it all in a vectorized operation. Likely to take a job of hours and make it into one that completes before your finger has come off the mouse button... – Amiga500 Apr 15 '21 at 12:12
  • @Amiga500 Yes, actually any other language that is capable of multi-threading will shorten runtime with every additional core a lot. Furthermore VBA was just not made for processing data. – Pᴇʜ Apr 15 '21 at 12:15