I promised a fuller answer. Sorry it is later than I expected:
- I got tied up with another problem,
- Technique 1, which I was expecting to recommend, did not work as I expected so I added some other techniques which are much more satisfactory.
As I said in my first comment:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
causes aArrayToPreserve
to have the default type of Variant. This does not match:
Dim Years() As String
As you discovered, redefining Years as a Variant, fixes the problems. An alternative approach would be to amend the declaration of ReDimPreserve
so aArrayToPreserve
is an array of type String. I would not recommend that approach since you are storing both strings and numbers in the array. A Variant array will handle either strings or numbers while a String array can only handle numbers by converting them to strings for storage and back to numbers for processing.
I tried your macro with different quantities of data and different amendments and timed the runs:
Rows of data Amendment Duration of run
3,500 Years() changed to Variant 4.99 seconds
35,000 Years() changed to Variant 502 seconds
35,000 aArrayToPreserve changed to String 656 seconds
As I said in my second comment, ReDim Preserve
is slow for both the inbuilt method and the VBA routine you found. For every call it must:
- find space for the new larger array
- copy the data from the old array to the new
- release the old array for garbage collection.
ReDim Preserve
is a very useful method but it must be used with extreme care. Sometimes I find that sizing an array to the maximum at the beginning and using ReDim Preserve to cut the array down to the used size at the end is a better technique. The best techniques shown below determine the number of entries required before sizing the array.
At the bottom of your routine, I added:
For i = LBound(Years, 1) To LBound(Years, 1) + 9
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
For i = UBound(Years, 1) - 9 To UBound(Years, 1)
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
This resulted in the following being output to the Immediate Window:
|||
|AAAA|2|2
|AAAB|3|4
|AAAC|5|7
|AAAD|8|11
|AAAE|12|16
|AAAF|17|22
|AAAG|23|23
|AAAH|24|25
|AAAI|26|28
|AOUJ|34973|34976
|AOUK|34977|34981
|AOUL|34982|34987
|AOUM|34988|34988
|AOUN|34989|34990
|AOUO|34991|34993
|AOUP|34994|34997
|AOUQ|34998|35002
|AOUR|35003|
|||
Since you have called the array Years
, I doubt my string values are anything like yours. This does not matter. What matters, is that I doubt this output was exactly what you wanted.
If you write:
ReDim Years(1, 3)
The lower bounds are set to the value specified by the Option Base
statement or zero if there is no Option Base
statement. You have lower bounds for both dimensions of zero which you do not use. This is the reason for the “|||” at the top. There is another “|||” at the end which means you are creating a final row which you are not using. The final used row does not have an end row which I assume in a mistake.
When I can divide a routine into steps, I always validate the result of one step before advancing to the next. That way, I know any problems are within the current step and not the result of an error in an earlier step. I use Debug.Print
to output to the Immediate Window most of the time. Only if I want to output a lot of diagnostic information will I write to a text file. Either way, blocks of code like mine are a significant aid to rapid debugging of a macro.
I would never write ReDim Years(1, 3)
. I always specify the lower bound so as to be absolutely clear. VBA is the only language I know where you can specify any value for the lower bound (providing it is less than the upper bound) so I will specify non-standard values if is helpful for a particular problem. In this case, I see not advantage to a lower bound other than one so that is what I have used.
With two dimensions arrays it is conventional to have columns as the first dimension and rows as the second. One exception is for arrays read from or to be written to a worksheet for which the dimensions are the other way round. You have rows as the first dimension. If you have used the conventional sequence you could have used the ReDim Preserve
method, thereby avoiding the RedimPreserve
function and the problem of non-matching types.
Technique 1
I expected this to be the fastest technique. Experts advise us to avoid “re-inventing the wheel”. That is, if Excel has a routine that will do what you want, don’t code an alternative in VBA. However, I have found a number of examples where this is not true and I discovered this technique was one of them.
The obvious technique here is to use Filter
, then create a range of the visible rows using SpecialCells
and finally process each row in this range. I have used this technique very successfully to meet other requirements but not here.
I did not know the VBA to select unique rows so started the macro recorder and filtered my test data from the keyboard to get:
Range("A1:A35000").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
My past uses of Filter
have all converted to AutoFilter which I have found to give acceptable performance. This converted to AdvancedFilter
which took 20 seconds both from the keyboard and from VBA. I do not know why it is so slow.
The second problem was that:
Set RngUnique = .Range(.Cells(1, 1), .Cells(RowLast, 1)) _
.SpecialCells(xlCellTypeVisible)
was rejected as “too complicated”.
Not being able to get the visible rows as a range means the benefits of Filter
are not really available. I have counted the visible rows to simulate having RngUnique.Rows.Count
. This shows the technique which has always worked with AutoFilter
. If AdvancedFilter
had reported the unique rows in an accepted time I might have investigated this problem but under the circumstances it does not seem worth the effort.
The macro demonstrating this technique is:
Option Explicit
Sub Technique1()
' * Avoid using meaningless names like i. Giving every variable a meaningful
' name is helpful during development and even more helpful when you return
' to the macro in six months for maintenence.
' * My naming convention is use a sequence of keywords. The first keyword
' identifies what type of data the variable holds. So "Row" means it holds
' a row number. Each subsequent keyword narrows the scope. "RowSb" is a
' row of the worksheet "Simple Boundary" and "RowYears" is a row of the Years
' array. "RowSbCrnt"is the current row of the worksheet "Simple Boundary".
' * I can look at macros I wrote years ago and know what all the variables are.
' You may not like my convention. Fine, development your own but do not
' try programming with random names.
' * Avoid data type Integer which specifies a 16-bit whole number and requires
' special processing on 32 and 64-bit computers. Long is now the recommended
' data type for whole numbers.
Dim NumRowsVisible As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
' This can save significant amounts of time if the macro amends the
' screen or switches between workbooks.
Application.ScreenUpdating = False
With Worksheets("Simple Boundary")
' Rows.Count avoiding having to guess how many rows will be used
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Hide non-unique rows
With .Range(.Cells(1, 1), .Cells(RowSbLast, 1))
.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
End With
' Count number of unique rows.
' It is difficult to time small pieces of code because OS routines
' can execute at any time. However, this count takes less than .5
' of a second with 35,000 rows.
NumRowsVisible = 0
For RowSbCrnt = 2 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
NumRowsVisible = NumRowsVisible + 1
End If
Next
' Use count to ReDim array to final size.
ReDim Years(1 To 3, 1 To NumRowsVisible)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
.ShowAllData ' Clear AdvancedFilter
End With
Application.ScreenUpdating = True
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
The output to the Immediate Window is:
Duration: 20.570
AAAA|2|2|
AAAB|3|4|
AAAC|5|7|
AAAD|8|11|
AAAE|12|16|
AAAF|17|22|
AAAG|23|23|
AAAH|24|25|
AAAI|26|28|
AOUI|34970|34972|
AOUJ|34973|34976|
AOUK|34977|34981|
AOUL|34982|34987|
AOUM|34988|34988|
AOUN|34989|34990|
AOUO|34991|34993|
AOUP|34994|34997|
AOUQ|34998|35002|
AOUR|35003|35008|
As you can see the last row is correct. A duration of 20 seconds is better than the 8 minutes of your technique but I am sure we can do better.
Technique 2
The next macro is similar to the last one but it counts the unique rows rather than use AdvancedFilter to hide the non-unique rows. This macro has a duration of 1.5 seconds with 35,000 rows. This demonstrates that counting how many rows are required for an array in a first pass of the data is a viable approach. The diagnostic output from this macro is the same as above.
Sub Technique2()
Dim NumRowsUnique As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = RowSbLast - 1
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value = .Cells(RowSbCrnt - 1, 1).Value Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' * Use count to ReDim array to final size.
' * Note that I have defined the columns as the first dimension and rows
' as the second dimension to match convention. Had I wished, this would
' have allowed me to use the standard ReDim Preserve which can only
' adjust the last dimension. However, this does not match the
' syntax of Cells which has the row first. It may have been better to
' maintain your sequence so the two sequences were the same.
ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value <> .Cells(RowSbCrnt - 1, 1).Value Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
End With
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
Technique 3
The next macro is only slightly changed from the last.
Firstly, I have replaced the literals used to identify the column numbers in worksheets and arrays with constants such as:
Const ColYrEnd As Long = 3
Under my naming convention ColYrEnd
= Column of Year array holding range End hence:
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
instead of Years(3, RowYearsCrnt) = RowCvCrnt - 1
This makes no difference to the compiled code but makes the source code easier to understand because you do not have to remember what columns 1, 2 and 3 hold. More importantly, if you ever have to rearrange the columns, updating the constants is the only change required. If you ever have to search through a long macro replacing every use of 2 as a column number (while ignoring any other use of 2) by 5, you will know why this is important.
Secondly, I have used:
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
to import column 1 to an array. The code that read the values from the worksheet now reads them from this array. Array access is much faster than worksheet access so this reduces the runtime from 1.5 seconds to .07 seconds.
The revised code is:
Sub Technique3()
Const ColCvYear As Long = 1
Const ColSbYear As Long = 1
Const ColYrYear As Long = 1
Const ColYrStart As Long = 2
Const ColYrEnd As Long = 3
Const RowSbDataFirst As Long = 2
Const RowCvDataFirst As Long = 2
Dim ColValues As Variant
Dim NumRowsUnique As Long
Dim RowCvCrnt As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, ColSbYear).End(xlUp).Row
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
' * The above statement imports all the data from column 1 as a two dimensional
' array into a Variant. The Variant is then accessed as though it is an array.
' * The first dimension has one entry per row, the second dimension has on entry
' per column which is one in this case. Both dimensions will have a lower bound
' of one even if the first row or column loaded is not one.
End With
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = UBound(ColValues, 1) - 1
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) = ColValues(RowCvCrnt - 1, ColCvYear) Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' I mentioned earlier that I was unsure if having rows and columns in the
' convention sequence was correct. I am even less sure here where array
' ColValues has been loaded from a worksheet and the rows and columns are
' not in the conventional sequence. ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvDataFirst, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvDataFirst
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) <> ColValues(RowCvCrnt - 1, ColCvYear) Then
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvCrnt, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvCrnt
End If
Next
' Record final row for final string
Years(ColYrEnd, RowYearsCrnt) = UBound(ColValues, 1)
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
End Sub
Other techniques
I considered introducing other techniques but I decided they were not useful for this requirement. Also, this answer is already long enough. I have provided much for you to think about and more would just be overload. As stated above I have reduced the run time for 35,000 rows from 8 minutes to 20 seconds to 1.5 seconds to .07 seconds.
Work slowly through my macros. I have hope I have provided adequate explanation of what each is doing. Once you know a statement exists, it is generally easy to look it up so there is not too much explanation of the statements. Come back with questions as necessary.