0

I have a large worksheet (~250K rows, 22 columns, ~40MB plain data) which has to transfer its content to an intranet API. Format does not matter. The problem is: When accessing the data like

Const ROWS = 250000
Const COLS = 22

Dim x As Long, y As Long
Dim myRange As Variant
Dim dummyString As String
Dim sb As New cStringBuilder

myRange = Range(Cells(1, 1), Cells(ROWS, COLS)).Value2

For x = 1 To ROWS
    For y = 1 To COLS
        dummyString = myRange(x, y) 'Runtime with only this line: 1.8s
        sb.Append dummyString 'Runtime with this additional line 163s
    Next
Next

I get a wonderful 2D array, but I am not able to collect the data efficiently for HTTP export. An X/Y loop over the array and access myRange[x, y] has runtimes >1min. I was not able to find an array method which helps to get the imploded/encoded content of the 2D array. My current workaround is missusing the clipboard (Workaround for Memory Leak when using large string) which works fast, but is a dirty workaround in my eyes AND has one major problem: The values I get are formatted, “.Value” and not “.Value2”, so I have to convert the data on server site again before usage, e.g. unformat currency cells to floats.

What could be another idea to deal with the data array?

Matschek
  • 205
  • 1
  • 9
  • What type of object is `myRange`? I suspect you declared it as `Range` object and not an actual variant. If you declare as variant, looping through it will be MUCH faster. [See example here](https://stackoverflow.com/questions/37689847/creating-an-array-from-a-range-in-vba) – Scott Holtzman Sep 24 '18 at 14:29
  • 1
    Please show your existing loop code. There are VBA implmentations of a "stringbuilder" class which likely is what you want here. Also "250M" rows - do you mean 250k? – Tim Williams Sep 24 '18 at 15:06
  • Scott: Yes it is a variant, the `set` was copied by mistake. Tim: 250K, not 250M, thanks. I updated the question with the complete test function – Matschek Sep 24 '18 at 15:18
  • You might want to replace `Dim sb As New cStringBuilder` with `Dim sb As cStringBuilder: Set sb = new cstringbuilder`. So the object's instance doesn't need to be checked each time you refer to it. – chillin Sep 24 '18 at 18:00
  • I presume your stringbuilder class/object is using some form of `join` to reduce overhead of concatenating immutable strings -- and is therefore reasonably efficient. – chillin Sep 24 '18 at 18:03
  • @chillin: The Stringbuilder class is one of the VBA implementations I found which makes use of the Windows CopyMemory API (http://www.vbaccelerator.com/home/VB/Code/Techniques/StringBuilder/String_Builder_Class_and_Demonstration_zip_cStringBuilder_cls.html) I changed the `Dim` line, but It makes no difference. – Matschek Sep 25 '18 at 13:06
  • A direct API call of CopyMemor to get the whole Array structure to a string/byte buffer could be a possibility, but I'm new to that topic (array pointers, size of memory block etc., hard to start...). – Matschek Sep 25 '18 at 14:03
  • Sorry if I'm covering old ground/stuff you've already tried, but how long does it take with simple `cstr` and `join`? – chillin Sep 25 '18 at 18:26
  • Can you provide more Details of your thoughts? join does not work with 2D Arrays, afaik. Or do you mean collecting all values into a 1D Array an then join to a String? – Matschek Sep 26 '18 at 11:12
  • Yes, my thoughts are that you create two string arrays `A` and `B`. `A` can be of size `1 to ROWS`, `B` can be of size of `1 to COLUMNS`. As you loop over each row in your `myRange` array, fill each element in `B` with each column's value in that row. After the final column for that row and before you move to the next row, join array `B` and assign to the row in `A`. With a loop of this size, only put necessary stuff inside the loop itself. At the end you would join `A`. You might need to use `cstr` when assigning items to `B`. would write an answer, but my comment is experimental. – chillin Sep 26 '18 at 18:32
  • @chillin, please post this as an answer so I can give you the points. I tried and it works perfect, my plain runtime collecting the data is ~ 3.5s Is seems the "two-level-join-contruct" is much more efficient than the other ways – Matschek Sep 28 '18 at 07:33
  • @Matschek Sure, I have posted this as an answer, as well as some code in case it benefits anyone else. – chillin Sep 28 '18 at 19:55

1 Answers1

1

My thoughts are that you create two string arrays A and B. A can be of size 1 to ROWS, B can be of size of 1 to COLUMNS. As you loop over each row in your myRange array, fill each element in B with each column's value in that row. After the final column for that row and before you move to the next row, join array B and assign to the row in A. With a loop of this size, only put necessary stuff inside the loop itself. At the end you would join A. You might need to use cstr() when assigning items to B.

Matschek (OP) was able to write the code based on the above, but for anyone else's benefit, the code itself might be something like:

Option Explicit

Private Sub concatenateArrayValues()

    Const TOTAL_ROWS As Long = 250000
    Const TOTAL_COLUMNS As Long = 22

    Dim inputValues As Variant
    inputValues = ThisWorkbook.Worksheets("Sheet1").Range("A1").Resize(TOTAL_ROWS, TOTAL_COLUMNS).Value2

    ' These are static string arrays, as OP's use case involved constants.
    Dim outputArray(1 To TOTAL_ROWS) As String ' <- in other words, array A
    Dim interimArray(1 To TOTAL_COLUMNS) As String ' <- in other words, array B

    Dim rowIndex As Long
    Dim columnIndex As Long

    ' We use constants below when specifying the loop's limits instead of Lbound() and Ubound()
    ' as OP's use case involved constants.
    ' If we were using dynamic arrays, we could call Ubound(inputValues,2) once outside of the loop
    ' And assign the result to a Long type variable
    ' To avoid calling Ubound() 250k times within the loop itself.

    For rowIndex = 1 To TOTAL_ROWS
        For columnIndex = 1 To TOTAL_COLUMNS
            interimArray(columnIndex) = inputValues(rowIndex, columnIndex)
        Next columnIndex
        outputArray(rowIndex) = VBA.Strings.Join(interimArray, ",")
    Next rowIndex

    Dim concatenatedOutput As String
    concatenatedOutput = VBA.Strings.Join(outputArray, vbNewLine)

    Debug.Print concatenatedOutput

    ' My current machine isn't particularly great
    ' but the code above ran and concatenated values in range A1:V250000
    ' (with each cell containing a random 3-character string) in under 4 seconds.

End Sub
chillin
  • 4,391
  • 1
  • 8
  • 8