0

In VBA, conveniently, programmers can access Excel functions via the Application.WorksheetFunction object. To save the programmer defining everything as spreadsheet-esque 2D arrays, the aggregate functions like WorksheetFunction.Average() will accept 1D arrays as well.

However, I've just spent the past few hours trying to pin down a unit test failure in a complex bit of statistical code, and I've found that the behaviour of the aggregate functions on 1D arrays changes once you go past an array length of 65536 (i.e. the max value of an Integer in VBA). In particular, the following code constructs the series 1,2,3,4,5,... and then takes the average, which should equal (n+1)/2:

Dim arr1D() As Double, arr2D() As Double, kk As Long

Debug.Print "Arr length", "True mean", "Excel mean (1D)", "Excel mean (2D)"
For kk = 65535 To 65538
    ReDim arr1D(1 To kk), arr2D(1 To kk, 1 To 1)
    For ii = 1 To kk
        arr1D(ii) = ii
        arr2D(ii, 1) = ii
    Next ii ' both arrays now contain 1, 2, 3, ...
    Debug.Print kk, (kk + 1) / 2, WorksheetFunction.Average(arr1D), WorksheetFunction.Average(arr2D)
Next kk

It produces this output - note the difference depending on whether the array is 1D or 2D:

Arr length  True mean  Excel mean (1D)  Excel mean (2D)
 65535       32768      32768            32768 
 65536       32768.5    32768.5          32768.5 
 65537       32769      1                32769 
 65538       32769.5    1.5              32769.5 

Similar weird behaviour is seen with other aggregate functions - for example StDev_S similarly rolls over at 65538 items, and outright crashes at 65537. In the words of my generation: nani the f***?

Excel version is: Microsoft® Excel® for Microsoft 365 MSO (Version 2205 Build 16.0.15225.20028) 32-bit. A possible related question is here. An obvious workaround is to implement my own aggregate functions - which is what I'm currently doing - but that doesn't answer the question of why the above code is doing what it does.

Alex
  • 33
  • 4
  • "*...single-dimensional arrays are passed back to Excel as a row of **columns**...*", cf Charles WIlliams' response here: https://stackoverflow.com/questions/11237195/array-size-limits-passing-array-arguments-in-vba – Jos Woolley Jul 15 '22 at 12:08
  • This seems related to a previously described limitation of `WorksheetFunction.Transpose` when transposing a 1D array wherein only the Only the **first** `uBound(1D_array) mod 2^16` elements will be returned to the transposed array. And, btw, the upper limit of the VBA integer data type is `32767`. – Ron Rosenfeld Jul 15 '22 at 12:20
  • @JosWoolley Does that really apply? Excel-365 has a limit 16,384 columns, yet 1D arrays larger than that can be successfully processed. – Ron Rosenfeld Jul 15 '22 at 12:29
  • @JosWoolley But the array never gets passed back to Excel; it gets consumed by the aggregate function which, in this case, is passed to Debug.Print. So the question of whether the single dimension reflects a row or a column should never come up. – Alex Jul 15 '22 at 14:46
  • @RonRosenfeld Good point re the VBA integer type - that suggests that the problem is an unsigned int in the underlying C code. Which I kinda expected given that this has all the symptoms of a classic case of integer wraparound. – Alex Jul 15 '22 at 14:50

0 Answers0