1

I use the code hereunder to calculate max values as described in this post (vba max value of group of values). The code works great but once I have more than 65k lines I get a data type mismatch when trying to pase the array:

sht.Range(Cells(1, lColumn), Cells(last.Row, lColumn)).Value = Application.Index(groupsArray, , lColumn)

Could somebody help me to slice the array in chunks. I have tried to get it working myself but without any luck.

    Sub FillGroupsMax()
        Dim lColumn As Long
        Dim sht As Worksheet
        Dim groupsArray As Variant    'array with all group infomation
        Dim groupsSeen As Variant    'array with group infomation already seen

        Application.ScreenUpdating = False    'stop screen updating makes vba perform better

        Set sht = ThisWorkbook.Worksheets("import")
        Set last = sht.Range("A:A").Find("*", Cells(1, 1), searchdirection:=xlPrevious)    'last cell with value in column A
        lColumn = sht.Cells(1, Columns.Count).End(xlToLeft).Column

        groupsArray = sht.Range(Cells(1, 1), Cells(last.Row, lColumn))
        'collect all the information on the Sheet into an array
        'Improves performance by not visiting the sheet

        For dRow = 2 To last.Row    'for each of the rows skipping header

            'check if group as already been seen
            If inArrayValue(Cells(dRow, 1).Value, groupsSeen) > 0 Then
                'if it has been seen/calculated attribute value
                'Cells(dRow, 4).Value = inArrayValue(Cells(dRow, 1).Value, groupsSeen)
                groupsArray(dRow, lColumn) = inArrayValue(Cells(dRow, 1).Value, groupsSeen)
            Else
                'if it hasn't been seen then find max
                'Cells(dRow, 4).Value = getMax(Cells(dRow, 1).Value, groupsArray)
                groupsArray(dRow, lColumn) = getMax(Cells(dRow, 1).Value, groupsArray, lColumn)

                'array construction from empty
                If IsEmpty(groupsSeen) Then
                    ReDim groupsSeen(0)
                    'groupsSeen(0) = Array(Cells(dRow, 1).Value, Cells(dRow, 4).Value)
                    groupsSeen(0) = Array(groupsArray(dRow, 1), groupsArray(dRow, lColumn))
                    'attribute value to array
                Else
                    ReDim Preserve groupsSeen(0 To UBound(groupsSeen) + 1)
                    groupsSeen(UBound(groupsSeen)) = Array(groupsArray(dRow, 1), groupsArray(dRow, lColumn))
                End If
            End If
        Next

    sht.Range(Cells(1, lColumn), Cells(last.Row, lColumn)).Value = Application.Index(groupsArray, , lColumn)
        'reactivate Screen updating
        Application.ScreenUpdating = True

    End Sub

    Function getMax(group As String, groupsArray As Variant, lColumn As Long) As Double

        'for each in array
        For n = 1 To UBound(groupsArray)
            'if its the same group the Max we seen so far the record
            If groupsArray(n, 1) = group And groupsArray(n, lColumn - 1) > maxSoFar Then
                maxSoFar = groupsArray(n, lColumn - 1)
            End If
        Next

        'set function value
        getMax = maxSoFar
    End Function

    Function inArrayValue(group As String, groupsSeen As Variant) As Double

        'set function value
        inArrayValue = 0
        'if array is empty then exit
        If IsEmpty(groupsSeen) Then Exit Function

        'for each in array
        For n = 0 To UBound(groupsSeen)
            'if we find the group
            If groupsSeen(n)(0) = group Then
                'set function value to the Max value already seen
                inArrayValue = groupsSeen(n)(1)
                'exit function earlier
                Exit Function
            End If
        Next

    End Function
ceci
  • 589
  • 4
  • 14
  • Are you saying if groupsArray has more than 65k rows then you cannot slice a column and assign to the worksheet? – QHarr Feb 03 '18 at 16:49
  • Are you using .xls files, as these only have 65k rows. If so save the file as .xlsx as it will then have over a million rows – David wyatt Feb 03 '18 at 14:50

3 Answers3

2

You can write a helper function to use instead of Application.Index

Bonus - it will be much faster than using Index (>5x)

Sub Tester()

    Dim arr, arrCol

    arr = Range("A2:J80000").Value

    arrCol = GetColumn(arr, 5) '<< get the fifth column

    Range("L2").Resize(UBound(arrCol, 1), 1).Value = arrCol

End Sub

'extract a single column from a 1-based 2-D array
Function GetColumn(arr, colNumber)
    Dim arrRet, i As Long
    ReDim arrRet(1 To UBound(arr, 1), 1 To 1)
    For i = 1 To UBound(arr, 1)
        arrRet(i, 1) = arr(i, colNumber)
    Next i
    GetColumn = arrRet
End Function

EDIT - since QHarr asked about timing here's a basic example

Sub Tester()
    Dim arr, arrCol, t, i as long
    arr = Range("A2:J80000").Value
    t = Timer
    For i = 1 to 100 
        arrCol = GetColumn(arr, 5) '<< get the fifth column
    Next i
    Debug.print Timer - t '<<# of seconds for execution
End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • Can't believe I missed such a simple solution! Good to know how much faster this is. Can you recommend any particularly references to setting up timed tests with vba? I have used some Google results in the past. – QHarr Feb 04 '18 at 06:50
  • 1
    I just use `Timer` and put it in a reasonable-sized loop so there are enough iterations to last a few seconds. – Tim Williams Feb 04 '18 at 08:07
  • Hi thx, this works exactly as expected with the most simple approach. Selected as best answer. – ceci Feb 04 '18 at 09:48
0

Below, whilst not as tidy as could be, is a way to process an array in chunks and Index to access a column and write out to the sheet.

I populated two columns (A:B) with data. Both had 132,000 rows, populated incrementally, with values from 1 to 132,000 in each column for my test run.

You can fiddle with cutOff to get the chunk size just below the point where the fail happens.

The code below is simply to demonstrate the principle of looping in batches, upto the set cutoff in each batch, until all rows have been processed.

 Option Explicit

 Public Sub WriteArrayToSheet()

   Dim wb As Workbook
   Dim ws As Worksheet

   Set wb = ThisWorkbook
   Set sht = wb.Worksheets("Sheet1") 'change as appropriate

   Dim myArr() 'dynamic array

   myArr = sht.Range("A1").CurrentRegion.Value 'you may want a more robust method

   Dim cutOff As Long 'the max value - what ever it is before error occurs
   cutOff = 1000

   Dim totalRows As Long 'total rows in array read in from sheet
   totalRows = UBound(myArr, 1)

   Dim totalArraysNeeded As Long

   'Determine how many lots of cutOff chunks there are in the total number of array rows
   totalArraysNeeded = Application.WorksheetFunction.Ceiling(totalRows / cutOff, 1)

   Dim rotations As Long 'number of times to loop original array to handle all rows
   Dim rowCountTotal As Long
   Dim rowCount As Long
   Dim tempArr() 'this will hold the chunk of the original array

   Dim rowCounter As Long
   Dim lastRow As Long
   Dim nextRow As Long
   Dim i As Long
   Dim j As Long
   Dim numRows As Long

   rotations = 1

   Do While rotations < totalArraysNeeded

       If rotations < totalArraysNeeded - 1 Then

           ReDim tempArr(1 To cutOff, 1 To UBound(myArr, 2)) 'size chunk array
           numRows = cutOff

       Else
           numRows = totalRows - rowCountTotal
           ReDim tempArr(1 To numRows, 1 To UBound(myArr, 2)) 'size chunk array

       End If

       For i = 1 To numRows

           rowCount = 1 'rows in this chunk looped
           rowCountTotal = rowCountTotal + 1  'rows in original array looped

           For j = LBound(myArr, 2) To UBound(myArr, 2)

               tempArr(i, j) = myArr(rowCountTotal, j)

           Next j

           rowCount = rowCount + 1

       Next i

       With sht
           lastRow = .Cells(.Rows.Count, "E").End(xlUp).Row 'Column where I am writing the sliced column out to
       End With

       If lastRow = 1 Then
           nextRow = 1
       Else
           nextRow = lastRow + 1
       End If

       sht.Range("E" & nextRow).Resize(UBound(tempArr, 1), 1) = Application.Index(tempArr, , 1) 'write out to sheet

       rotations = rotations + 1

   Loop

End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
0

As @Tim suggested, the best way to slice a large array is use a loop to copy the column.

Though in your case, most of the processing time is spent on computing the maximum since your code is using a nested loop.

If you want to reduce significantly the processing time, then use a dictionary:

Sub Usage
   GetMaxByGroupTo _
        sourceGroups := ThisWorkbook.Range("Sheet1!A2:A100"), _
        sourceValues := ThisWorkbook.Range("Sheet1!B2:B100"), _
        target := ThisWorkbook.Range("Sheet1!C2")
End Sub

Sub GetMaxByGroupTo(sourceGroups As Range, sourceValues As Range, target As Range)
    Dim dict As Object, groups(), values(), r As Long, max

    Set dict = CreateObject("Scripting.Dictionary")
    groups = sourceGroups.Value2
    values = sourceValues.Value2

    ' store the maximum value of each group in a dictionary for an efficient lookup '

    For r = Lbound(groups) to Ubound(groups)
        max = dict(groups(r, 1))
        If VarType(max) And values(r, 1) <= max Then Else dict(groups(r, 1)) = values(r, 1)
    Next

    ' build and copy the result array to the sheet  '

    For r = Lbound(groups) to Ubound(groups)
        values(r, 1) = dict(groups(r, 1))
    Next

    target.Resize(Ubound(groups), 1).Value2 = values

End Sub
Florent B.
  • 41,537
  • 7
  • 86
  • 101
  • wow nice, this is indeed faster. but what if I would like to exclude some group ID's? – ceci Feb 04 '18 at 17:10
  • use a second dictionary to store the groups to exclude and then test the group in the loop: `If Not dictExclude.Exists(groups(r, 1)) Then`. The benefit of using a dictionary is that the lookup time doesn't increase when the number of records increases. – Florent B. Feb 04 '18 at 17:19
  • 1
    great got it working. previous script dit 220k rows in less than 2 min. this one does in in less than 2 sec!! – ceci Feb 04 '18 at 17:55