1

The following simple code copies some cells into the clipboard and gets it back as text data. Later this data will be POSTed to an internal webservice, but this is commented out at this point of time. Every time I run the script the used memory of the Excel grows by about 80MB (twice the size of the string). I have no clue where/why the garbage collector fails and what I can do againt this.

Sub memory_leak()
    Dim DataObj As New MSForms.DataObject
    Dim data As String

    'Copy Data Range ~ the first 250.000 rows / 22 columns
    Range(Cells(9, 1), Cells(250000, 22)).Copy

    DataObj.GetFromClipboard
    data = DataObj.GetText

    ' Call postToURL("http://intranet/API.php?action=test", data)

    'Try to free memory...
    Application.CutCopyMode = False
    DataObj.Clear
    data = ""
End Sub
Matschek
  • 205
  • 1
  • 9
  • 1
    What's the point of sending it to the clipboard and doing `.GetText`? Why not just use `Range(...).Values`? – BruceWayne Sep 18 '18 at 15:47
  • 1
    `Dim .... As New` is declaring an auto-instantiated object, which means `DataObj` can't really be destroyed (try `Set DataObj = Nothing` and then `Debug.Print DataObj Is Nothing`). I'd start by `Set`ting it on a separate instruction. – Mathieu Guindon Sep 18 '18 at 16:03
  • @BruceWayne: `Range..Values` gives a 2D array, which is almost impossible to work with then trying to make a string out of it. All of my attempts result in very (very) large runtimes when looping over the structure. And `Join` only works with 1D arrays. – Matschek Sep 18 '18 at 16:11
  • @MathieuGuindon: Interesting point. What exactly do you mean with "setting it on a separate instruction"? I just tried to dim DataObj in the global scope to force reusage, but this leads to another error (DataObject:GetText Data in clipboard is invalid) – Matschek Sep 18 '18 at 16:17
  • if the range is a set range (not changing)... a naive solution would be to just create 25 variables to hold 10,000 lines each and add them individually to your POST, or 5 of 50k or so lol – Kubie Sep 18 '18 at 16:19
  • I never said to widen its scope... Just `Dim` and `Set` separately; `Dim . As New` is a recipe for unexpected behavior, esp. wrt memory management. – Mathieu Guindon Sep 18 '18 at 16:20
  • 3
    VBA isn't garbage-collected, it's reference-counted. Note that `Dim` statements are NOT executable (you can't place a breakpoint on them), but `Set` statements are. So `Dim DataObj As MSForms.DataObject` and on the next line `Set DataObj = New MSForms.DataObject`, is all I meant. That said the problem might be elsewhere, but `Dim ..As New` can't possibly be helping. – Mathieu Guindon Sep 18 '18 at 16:30
  • Just a thought: assuming 50 characters per cell, 2 bytes per character, we're looking at `22*250000*50*2/1024/1024/1024` ~= half a gigabyte being transmitted over an HTTP POST request. Since this is an internal intranet solution, wouldn't it be a better idea to have the file on a network share, and then having some SSIS package grab the content and `SqlBulkInsert` it directly to a database? VBA can do lots of things, but sometimes VBA *isn't* the ideal solution. That said, 80MB would be about 7-8 characters per cell, and half of that would be 3-4 characters. Are you sure about your numbers? – Mathieu Guindon Sep 18 '18 at 17:02
  • I tried with dim/set, same result. I replaced the assignment `data = DataObj.GetText` with a `call DataObj.GetText` and still get the growing memory effect, so the MSForms.DataObject should be the real problem, not the string. I'll search into that direction. – Matschek Sep 19 '18 at 08:33
  • About the data size: 1 byte per character and ~ 10 almost empty columns lead to ~ 40MB of data, this value is correct. I'm comparing different ways within/without VBA. The current implementation exports a CSV and triggers a PHP script on the Server which reads it into a MySQL database. The last time I investigated alternative solutions for importing the XLSX file on the server I got stuck on other problems (ODBC import did not “guess” some cell formats correctly, PHP XLSX libraries failed to handle the large amount of data…) but that’s another topic and maybe worth another thread. – Matschek Sep 19 '18 at 08:36
  • I implemented the clipboard access via API. The runtime of the copy increases from 2.5s to 3.3s, but this is fine here. The memory consumption grows by 140MB but drops down again after the work is done. Maybe a better API implementation can even be more efficently, I copied from http://access.mvps.org/access/api/api0049.htm – Matschek Sep 19 '18 at 09:05

1 Answers1

0

The root cause is not the handling of large strings. The problem is the call of the MSForms.DataObj.GetText method which does not free memory correctly. The clipboard access can alternatively be done via native Windows APIs. This site http://access.mvps.org/access/api/api0049.htm shows an example implementation which works fine in my tests, even if it is ~ 30% slower than the method provided by MSForms.

Matschek
  • 205
  • 1
  • 9