0

I have a very large array in VBA which includes a lot of 0 values that I'd like to remove. Something like this:

A    B    C    12345
D    E    F    848349
G    H    I    0
J    K    L    0
M    N    O    0
P    Q    R    4352
S    T    U    0
V    W    X    0

I would like to be able to quickly/easily strip out all rows from this array that have a zero in the 4th column, resulting in something like this:

A    B    C    12345
D    E    F    848349
P    Q    R    4352

This array has 100,000 or so rows, that hopefully gets down to a number closer to 20,000 or 30,000 rows instead after processing.

I assume iterating through every entry will prove very time-consuming.

Is there another way that is faster?

pnuts
  • 58,317
  • 11
  • 87
  • 139
gotmike
  • 1,515
  • 4
  • 20
  • 44
  • 2
    Did you test the actual performance of looping over the array? Should be pretty fast. – Tim Williams Oct 06 '15 at 23:02
  • Looping through that array should not take more than a few minutes... – emihir0 Oct 06 '15 at 23:26
  • @tim-williams - i have not tried looping, but it seems to be quite inefficient. even at that, i'm not sure i know how to "remove" one row from an array like this. any example would be helpful. – gotmike Oct 06 '15 at 23:29
  • @pnuts - not sure how to do this in VBA. do you have an example? – gotmike Oct 06 '15 at 23:31
  • @emihir0 - MINUTES? that sounds like an eternity. getting all the data from an external XML file doesn't take but 20 seconds if i write it in small chunks and go to the spreadsheet every time. – gotmike Oct 06 '15 at 23:34
  • When you say 'array', are you talking about rows of a worksheet that you want to remove? Or do you simply have an array variable that you want to reduce? – Ambie Oct 06 '15 at 23:35
  • 1
    @pnuts - ah, for sure if it's already written to the spreadsheet. i was hoping to do this BEFORE writing to the spreadsheet. in other words, store the data in an array, run the filter, then save the results to the spreadsheet. all off-sheet in VBA. – gotmike Oct 06 '15 at 23:36
  • 1
    @Ambie - an array variable. – gotmike Oct 06 '15 at 23:37
  • @gotmike I said minutes because it depends on what else you might want to do in the loops. Also, what you described sounds like a task that needs not to be done frequently (ie. once a week to clean up the data) and so whether it takes 10 seconds or 10 minutes does not really matter, as long as it is done correctly. If you need to process 100k of rows of data frequently, VBA, nor Excel are not the right tools. – emihir0 Oct 06 '15 at 23:39
  • @emihir0 - this is part of an import process. so it's not done "regularly" but speed is a concern. i concur that VBA is not the best avenue and we may re-write this in C# as an XLL but we are stuck with Excel due to restrictions beyond my control. thanks for the clarification. – gotmike Oct 06 '15 at 23:42
  • Perhaps write the array to spare sheet, use autofilter to remove rows of zero then reload the visible cells to that array variable? Turn calculation to manual then do above, and restore it's mode afterward. – PatricK Oct 06 '15 at 23:51
  • The biggest slowdown you will experience is deleting a large number of non-contiguous rows from a worksheet. Conditionally rewrite the array to a new (smaller) array and then dump the data onto the worksheet. –  Oct 06 '15 at 23:53
  • The assumption _I assume iterating through every entry will prove very time-consuming_ is quite wrong. I ran a quick test on just the array handling part (generating a new array minus the `0` rows), it ran for >100,000 rows in less than 50ms. In my experience a well designed loop is often faster than other non-loopy ways of doing things. Getting data on and off a sheet can be quite fast too, when done properly. So the answer to your question _Is there another way that is faster?_ is: Probably not! – chris neilsen Oct 07 '15 at 02:32
  • Arrays are extremely fast, but you may find the Autofilter acceptable (and easier to implement) as in **[this answer](http://stackoverflow.com/questions/30959315/excel-vba-performance-1-million-rows-delete-rows-containing-a-value-in-less/30959316#30959316)** - Out of 1 million records it removes 100,000 rows in an average of 33 seconds. Let me know if you're interested and I can easily adjust it for your requirements – paul bica Oct 07 '15 at 04:05

2 Answers2

3

I'm not aware of any other way in VBA than to loop through the array and write another array/list.

What makes it trickier is that your array looks to be two-dimensional and VBA will only allow you to redim the last dimension. From the look of your data, you'd want to redim the first dimension as you iterate through your array.

There are several solutions:

  1. Iterate your data twice - once to get the array size (and probably to store the relevant row numbers) and a second time to transfer the raw data into your new data.

  2. Iterate once and just reverse your dimensions (ie row is last).

  3. Use an array of arrays, so that each array only has one dimension).

  4. Use a Collection which doesn't need to be dimensioned - this would be my preferred option.

Option 4 would look like this (I've assumed your array is zero based):

Dim resultList As Collection
Dim r As Long

Set resultList = New Collection
For r = 0 To UBound(raw, 1)
    If raw(r, 3) <> 0 Then
        resultList.Add Array(raw(r, 0), raw(r, 1), raw(r, 2), raw(r, 3))
    End If
Next

If you have to write to a new array, then here's an example of Option 1:

Dim rowList As Collection
Dim result() As Variant
Dim r As Long
Dim c As Long
Dim v As Variant

Set rowList = New Collection
For r = 0 To UBound(raw, 1)
    If raw(r, 3) <> 0 Then
        rowList.Add r
    End If
Next

ReDim result(rowList.Count - 1, 3) As Variant
c = 0
For Each v In rowList
    result(c, 0) = raw(v, 0)
    result(c, 1) = raw(v, 1)
    result(c, 2) = raw(v, 2)
    result(c, 3) = raw(v, 3)
    c = c + 1
Next
Ambie
  • 4,872
  • 2
  • 12
  • 26
  • If I then want to write the entire Collection to a range, is that an easy process? I understood that the "pro" of using an array was that writing the data out to the spreadsheet was a one-step process. How does this work with a Collection? – gotmike Oct 07 '15 at 01:00
  • @gotmike, you can iterate a collection with a `For Each` loop, but you'd be right in thinking that writing an array is far quicker. That's why I've put the second piece of code in for you. I've tested it and it's still pretty quick, so that would be the option I'd take if you need to then write the output to a worksheet. – Ambie Oct 07 '15 at 01:42
  • FYI, I just tested the Option 1 code on 100,000 rows, reducing to 20,000 rows and it took 62ms. – Ambie Oct 07 '15 at 03:10
  • I went this route, iterating, and the speed was acceptable. Thanks! – gotmike Oct 15 '15 at 21:20
1

Okay, it's all off-sheet, so all the arrays are zero-based. To test this set-up, I created a worksheet with four columns, as per your data and using random numbers in the fourth column. I saved this to a text file (TestFile.txt), then read it in to be able to get a zero-based array (Excel ranges are 1-based when you take them into an array). I saved 150000 rows to the text file to properly stress the routine. Yes, I have an SSD and that would affect the 2s run time, but I'd still expect it to run in <10s on a spinning HDD, I think.

Anyway, here's the code (requires a VBA reference to Microsoft Scripting Runtime purely to read in the file):

Public Function ReturnFilteredArray(arrSource As Variant, _
                                strValueToFilterOut As String) As Variant
Dim arrDestination      As Variant
Dim lngSrcCounter       As Long
Dim lngDestCounter      As Long

ReDim arrDestination(UBound(arrSource, 1) + 1, UBound(arrSource, 2) + 1)

lngDestCounter = 1
For lngSrcCounter = LBound(arrSource, 1) To UBound(arrSource, 1)
    ' Assuming the array dimensions are (100000, 3)
    If CStr(arrSource(lngSrcCounter, 3)) <> strValueToFilterOut Then
        ' Hit an element we want to include
        arrDestination(lngDestCounter, 1) = arrSource(lngSrcCounter, 0)
        arrDestination(lngDestCounter, 2) = arrSource(lngSrcCounter, 1)
        arrDestination(lngDestCounter, 3) = arrSource(lngSrcCounter, 2)
        arrDestination(lngDestCounter, 4) = arrSource(lngSrcCounter, 3)

        lngDestCounter = lngDestCounter + 1
    End If
Next

ReturnFilteredArray = arrDestination
End Function

Sub TestRun()
Dim fso As FileSystemObject
Dim txs As TextStream
Dim arr As Variant
Dim arr2 As Variant
Dim lngCounter As Long

Debug.Print Now()
Set fso = New FileSystemObject
Set txs = fso.OpenTextFile("E:\Users\Thingy\Desktop\TestFile.txt", ForReading)
arr = Split(txs.ReadAll, vbNewLine)
ReDim arr2(UBound(arr), 3)

For lngCounter = 0 To UBound(arr) - 1
    arr2(lngCounter, 0) = Split(arr(lngCounter), vbTab)(0)
    arr2(lngCounter, 1) = Split(arr(lngCounter), vbTab)(1)
    arr2(lngCounter, 2) = Split(arr(lngCounter), vbTab)(2)
    arr2(lngCounter, 3) = Split(arr(lngCounter), vbTab)(3)
Next

arr2 = ReturnFilteredArray(arr2, "0")
Range("L2").Resize(UBound(arr2, 1), 5) = arr2

Debug.Print Now()
End Sub

There are a number of assumptions in there, not least the dimensions. Note the difference in the second dimension counter between arrDestination and arrSource. That's to do with Excel being 1-based and normal arrays being 0-based.

Also, when I'm writing out the array, I needed to bump up the second dimension to 5 in order to get all of the array out to the sheet. I wasn't able to trim off the empty elements since ReDim Preserve only works on the uppermost dimension (columns) and it's the first dimension (rows) that's changing here.

Anywho, this should serve as a reminder that despite its faults Excel is pretty amazing.

NeepNeepNeep
  • 883
  • 7
  • 8
  • Your test timing includes reading data from a dissk (as you said) and splitting it into an array (`Split` is a slowish function) - two things the OP doesn't talk about, he already has the data in an array. I ran a quick test on just the array handling part, it ran for >100,000 rows in less than 50ms – chris neilsen Oct 07 '15 at 02:29