11

I would like to fill an array in VBA with the row numbers of only rows which meet a certain criteria. I would like the fastest method possible (for example, something like RowArray = index(valRange=valMatch).row)

Below is the code for the (slow) range loop.

Current Code

Sub get_row_numbers()

Dim RowArray() As Long
Dim valRange As Range
Dim valMatch As String

Set valRange = ActiveSheet.Range("A1:A11")
valMatch = "aa"
ReDim RowArray(WorksheetFunction.CountIf(valRange, valMatch) - 1)

For Each c In valRange
    If c.Value = valMatch Then RowArray(x) = c.Row: x = x + 1
Next c    
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
Steve
  • 1,620
  • 2
  • 19
  • 33

8 Answers8

12

Still around 2-3 times the time of the efficient variant array from Chris, but the technique is powerful and has application beyond this question

One point to note is that Application.Transpose is limited to 65536 cells, so a longer range needs to be "chunked" into pieces.

Sub GetEm()
Dim x
x = Filter(Application.Transpose(Application.Evaluate("=IF(A1:A50000=""aa"",ROW(A1:a50000),""x"")")), "x", False)
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • 2
    + 1 I have to agree with @Reafidy here :) – Siddharth Rout Oct 25 '12 at 11:53
  • 3
    I really enjoyed your answer and learned a lot from this. You may be able to make this approach faster by doing something like this `x = Filter(ActiveSheet.Evaluate("=TRANSPOSE(IF(A1:A50000=""aa"",ROW(A1:a50000),""x""))"), "x", False)` I know Charles Williams is active on this thread, but he mentioned using Worksheet.Evaluate can be quicker than Application.Evaluate in [this post.](http://fastexcel.wordpress.com/2011/11/02/evaluate-functions-and-formulas-fun-how-to-make-excels-evaluate-method-twice-as-fast/) – deusxmach1na Oct 26 '12 at 17:21
  • 1
    @deusxmach1na Thx for the kinds words and interesting link. While on I could replicate the time differential on testing the examples from Charles as is from his article, it made no difference to my sample. I will dig further. Rgds brettdj – brettdj Oct 26 '12 at 23:47
8

First copy the range to a variant array , then loop over the array

Arr = rngval
For I = 1 to ubound(arr)
    If arr(I,1) = valMatch Then RowArray(x) = I: x = x + 1
Next
Searush
  • 628
  • 1
  • 5
  • 18
chris neilsen
  • 52,446
  • 10
  • 84
  • 123
  • 1
    Oh, I like it, still looping, but definate imrpovement, ty +1 – Steve Oct 16 '12 at 18:59
  • 2
    +1 Nice one. @The_Barman: The fastest method that I can think of is to use Autofilter to filter your range and then copy them to an Array (and then use the method shown by Chris) – Siddharth Rout Oct 24 '12 at 04:48
  • @Siddharth See my new answer on the performance of the posted methods. I'm not sure an Autofilter will improve on the simple loop, because you will get the result back as a multi area range and have to process these ranges seperately (unless you have a clever way to parse the results in one pass) – chris neilsen Oct 24 '12 at 10:29
  • 1
    `unless you have a clever way to parse the results in one pass` Yup there is :) – Siddharth Rout Oct 24 '12 at 10:30
  • 1
    I'm not familiar with the "unbound" function - is that supposed to be UBound? – Stepan1010 Oct 26 '12 at 15:42
4

There is an assumption in the question title: that a looping solution is slow and a non-looping solution is faster. So, I conducted some comparisons to check that.

Test Case

I created some sample data consisting of 50,000 samples, and 50% matching values. For the fastest methods I created two more sample sets, again with 50,000 rows and one with 10% matching rows, another with 90% matching row.

I ran each of the posted methods over this data in a loop, repeating the logic 10 times (so times are for processing a total of 500,000 rows).

                  50%        10%        90%  
ExactaBox        1300       1240       1350  ms
Scott Holtzman 415000         
John Bustos     12500       
Chris neilsen     310        310        310
Brettdj           970        970        970
OP               1530       1320       1700

So the moral is clear: just because it includes a loop, doesn't make it slow. What is slow is access the worksheet, so you should make every effort to minimise that.

Update Added test of Brettdj's comment: single line of code

For completeness sake, here's my solution

Sub GetRows()
    Dim valMatch As String
    Dim rData As Range
    Dim a() As Long, z As Variant
    Dim x As Long, i As Long
    Dim sCompare As String

    Set rData = Range("A1:A50000")
    z = rData
    ReDim a(1 To UBound(z, 1))
    x = 1
    sCompare = "aa"
    For i = 1 To UBound(z)
        If z(i, 1) = sCompare Then a(x) = i: x = x + 1
    Next
    ReDim Preserve a(1 To x - 1)    
End Sub
chris neilsen
  • 52,446
  • 10
  • 84
  • 123
  • 1
    +1. Thanks for taking the time to do this. Kind of an enlightening exercise. Probably will change the way I code from now on. – Scott Holtzman Oct 24 '12 at 13:14
  • 1
    Thanks Chris for the speed test. Yes, there's a common theme among the answers, not because we necessarily think looping an array is slow, but because the question specifically asks, "without looping through them." – ExactaBox Oct 24 '12 at 13:32
  • Love the work, for me is all about the fastest way to fill my array. – Steve Oct 24 '12 at 19:20
  • @ExactaBox I overlooked that part of the question title, updated to reflect it. – chris neilsen Oct 24 '12 at 19:22
  • The "them" in the question refers to the range not so much the array of variables, though I didn't understand there being such a considerable difference in the two loop methods. A real eye opener this one, ty chris. (erm silly question, first bounty... all 100 have to go to one answer, cannot be shared?) – Steve Oct 24 '12 at 19:31
  • 1
    The_Barman, your last 2 comments don't fly... your first comment to chris's initial answer to loop through an array is that it's an improvement, but "still looping." And you created a bounty even though that answer existed. So you were very clearly asking for a non-looping answer, not the fastest answer. – ExactaBox Oct 24 '12 at 19:40
  • ExactaBox... I am after the fastest method (with confidence that there is not a faster one)... I assumed that looping through the variables would be slow. I had only one answer with no "best case non-loop" to compare against. So I raised a Bounty to be certain of the best answer. If I could have tested the first (and only) answer against a non loop solution I would not have raised the Bounty. I cannot know the answer to my own question without asking it, surely? – Steve Oct 24 '12 at 19:50
  • 1
    I'm getting a subscript out of range error on ReDim Preserve a(1 To x - 1) – Stepan1010 Oct 26 '12 at 15:46
  • The subscript out of range is likely due to no matches... the method is case sensitive (caught me out that one while testing) – Steve Oct 26 '12 at 20:41
  • Just for completeness: Chris, can you add the timing for my original method too to the posted times. Thanks. – Steve Oct 26 '12 at 20:44
  • I have tested all methods on the actual data and have found your timings to be consistent, I did find deus's comment on brettdjs's post to be about 30% faster then Bretts, but still the fastest is this one by a factor of 2. This has been a real eye opener. – Steve Oct 27 '12 at 09:02
3

Building off what others have offered here, I've combined both methods along with some string manipulation to get the exact row numbers of any given range containing the desired match without looping.

The only note that differs from your code is that RowArray() is a String type. However, you could convert it to Long using CLng as you strip numbers out as needed, if you need to do that.

Sub get_row_numbers()

Dim rowArray() As String, valRange As Range, valMatch As String
Dim wks As Worksheet, I As Long, strAddress As String    
Set wks = Sheets(1)
valMatch = "aa"

With wks    
    Set valRange = .Range("A1:A11")        
    Dim strCol As String
    strCol = Split(valRange.Address, "$")(1)
    '-> capture the column name of the evaluated range
        '-> NB -> the method below will fail if a multi column range is selected

    With valRange        
        If Not .Find(valMatch) Is Nothing Then
        '-> make sure valMatch exists, otherwise SpecialCells method will fail

            .AutoFilter 1, valMatch                    
            Set valRange = .SpecialCells(xlCellTypeVisible)
            '-> choose only cells where ValMatch is found

            strAddress = valRange.Address '-> capture address of found cells
            strAddress = Replace(Replace(strAddress, ":", ""), ",", "") '-> remove any commas and colons
            strAddress = Replace(strAddress, "$" & strCol & "$", ",") '-> replace $column$ with comma
            strAddress = Right(strAddress, Len(strAddress) - 1) '-> remove leading comma

            rowArray() = Split(strAddress, ",")

            '-> test print
            For I = 0 To UBound(rowArray())                    
                Debug.Print rowArray(I)                        
            Next

        End If 'If Not .Find(valMatch) Is Nothing Then            
    End With ' With valRange        
End With 'With wks

End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
Scott Holtzman
  • 27,099
  • 5
  • 37
  • 72
  • A definate +1 for the use of split, but the returned values do need to be Long as they are being fed into a row array used in an Index function to fill an array with data. – Steve Oct 23 '12 at 20:27
2

You may want to look at Find vs Match vs Variant Array which concludes that the variant array approach is fastest unless the hit density is very low.

But the fastest method of all is only for sorted data and exact match: use binary search to find the fisrt and last ocurrences and then get that subset of data into a variant array.

Charles Williams
  • 23,121
  • 5
  • 38
  • 38
1

I still have a loop, but only through the necessary rows to populate the array:

Sub get_row_numbers()

Dim RowArray() As Long
Dim valRange As Range
Dim valMatch As String

Set valRange = ActiveSheet.Range("A1:A11")
valMatch = "aa"
ReDim RowArray(WorksheetFunction.CountIf(valRange, valMatch) - 1)

Dim c As Range
Dim x As Integer
Set c = valRange.Find(What:=valMatch, LookIn:=xlFormulas, SearchOrder:=xlByRows, SearchDirection:=xlNext)

Do
  RowArray(x) = c.Row
  Set c = valRange.FindNext(after:=c)
  x = x + 1
Loop Until x = UBound(RowArray) + 1


End Sub
John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • 1
    But is this faster than looping through the whole range picked up and held in an array, I am not sure, will need to test with the actual data. But a nice suggestion. – Steve Oct 23 '12 at 20:29
1

You have your range hard-coded in the example. Do you have a spare column to the right? If so, you could fill the cells to the right with 0 if it's not a match, or the row number if it is. Then pull that into an array and filter it. No loops:

Sub NoLoop()

Dim valMatch As String
Dim rData As Excel.Range, rFormula As Excel.Range
Dim a As Variant, z As Variant

    Set rData = ThisWorkbook.Worksheets(1).Range("A1:A11") 'hard-coded in original example
    Set rFormula = ThisWorkbook.Worksheets(1).Range("B1:B11") ' I'm assuming this range is currently empty
    valMatch = "aa" 'hard-coded in original example

    'if it's a valid match, the cell will state its row number, otherwise 0
    rFormula.FormulaR1C1 = "=IF(RC[-1]=""" & valMatch & """,ROW(RC),0)"

    a = Application.Transpose(rFormula.Value)
    z = Filter(a, 0, False) 'filters out the zeroes, you're left with an array of valid row numbers

End Sub

I have to credit Jon49 at One-dimensional array from Excel Range for the Application.Transpose trick to get a 1-d array.

Community
  • 1
  • 1
ExactaBox
  • 3,235
  • 16
  • 27
1

Everyone, thanks for your individual inputs.

ExactaBox, your solution has been much helpful to me. However, there is a catch in returning 0 value through formula

rFormula.FormulaR1C1= "=IF(RC[-1]=""" & valMatch & """,ROW(RC),0)".

Since VBA Filter function filters out values by making string comparisons, it would also filter out row numbers having zeroes in them. For example valid row numbers, 20, 30, 40 etc. shall also be filtered out because they contain zeroes, so it would be better to write a string in place of 0 in the formula, which could therefore be:

rFormula.FormulaR1C1= "=IF(RC[-1]=""" & valMatch & """,ROW(RC),""Valid"")"

as was also suggested by brettdj above, who used "x" string as the last argument.

Robert
  • 5,278
  • 43
  • 65
  • 115