1

I have a pretty technical question for which I don't imagine the answer and I would like to get an advice for optimization issues.

In my sheet, I've built a lot of XML lines. Specifically, they are 389,256 lines spread across the Range("A1") -> Range("A389256").

My objective is to build a string which contains all these lines, to hence fill it into an XML file. I do this with the following piece of code:

Private Function buildFileText() As String

    Dim ss As String
    Dim j As Long

    For j = 1 To Sheets("FileContent").Range("A1").End(xlDown).Row
        ss = ss & Sheets("FileContent").Range("A" & j).Value & vbNewLine
    Next j

    buildFileText = ss

End Function

Basically, I just build the string starting from an empty string and adding line by line all the content of my spreadsheet.

What is scaring is the time that this piece of code takes to execute: I've put a timer right before and right after the For loop, and it took 1 hour and 44 minutes to execute.

I don't find this behavior normal, because although the high number of lines, if I try to perform the same action on (let's say) 10,000 lines, it takes not even a second. Imagining it would take a second, I would expect the whole action to take 1 second*40=40 seconds approximately. On the other hand, if it was a purely memory issue, I would have expected a stack overflow which didn't occur. So it seems the time it takes to perform each concatenation grows exponentially.

My questions:

  1. Would anyone explain me why this is happening?
  2. Does anyone have any suggestion to improve the performance of this code? Maybe I should split the concatenation in several strings (say 40 strings of 10k lines each) and concatenate each of them at a later stage?
Matteo NNZ
  • 11,930
  • 12
  • 52
  • 89
  • 2
    stream it directly to the file instead of first creating the in-memory representation of that file.... – rene Dec 06 '16 at 13:10
  • I'm backing @rene – user3598756 Dec 06 '16 at 13:12
  • @rene, thanks, trying both, will keep you posted . I think indeed streaming them directly into the file might be faster, I'll have a try. – Matteo NNZ Dec 06 '16 at 13:15
  • can you just save the worksheet as textfile? – Nathan_Sav Dec 06 '16 at 13:20
  • 1
    Why are you building an XML file as text directly from a spreadsheet to begin with? Using the [MSXML objects](https://msdn.microsoft.com/en-us/library/ms760218(v=vs.85).aspx) would be much easier and faster because it would be doing all the heavy lifting for you. – Comintern Dec 06 '16 at 13:32
  • @rene, streaming the lines directly in the file has reduced the time from 1h44 to 35 minutes. Still a lot, but much better than before. Thanks a lo. – Matteo NNZ Dec 06 '16 at 14:48

2 Answers2

2

other than the ways you see in comments, you may also want to try an "array" approach

whose maximum array size limitation can be overcome by splitting it in as much subarrays as necessary, as follows:

Private Function buildFileText() As String    
    Dim ss As String
    Dim count As Long

    With Worksheets("FileContent")
        With .Range("A1", .Cells(.Rows.count, 1).End(xlUp))
            Do While .count - count > 24684
                ss = ss & Join(Application.Transpose(.Offset(count).Resize(24684).Value), vbNewLine)
                count = count + 24684
            Loop
            buildFileText = ss & Join(Application.Transpose(.Offset(count).Resize(.count - count).Value), vbNewLine)                
        End With
    End With        
End Function
user3598756
  • 28,893
  • 4
  • 18
  • 28
2

It does not take a long time to initialize Huge strings.

Strings are not resizable. When you concatenate two strings together a Temp string is created to can hold both values. The values are then assigned to the Temp string and the target string is then replaced with the temp string.

Why does it take less than a Second for the first 10,00 lines and 389,256 takes 1 hour and 44 minutes?

"So it seems the time it takes to perform each concatenation grows exponentially" - It is actually growing at a consistent rate. If it were growing exponentially Excel would crash pretty quickly.

But the problem is that it is growing and every time you concatenate more memory is needed to create the new larger strings.

What can we do to improve performance?

In your case I would use MSXML2 to create the XML output. It is well documented will make your code easily extendable.

The second option is to implement a String Builder Pattern. String Builders reduce the number of concatenations by initializing a very large output string and writing the new strings to the next position in the output string.

My answer to Excel vba xml parsing performance shows how use a String BUilder Pattern to expost an Excel table as XML. Parfiat's answer to the same question demonstrates how to use MSXML2 to create the XML file.

Community
  • 1
  • 1