0

I have a macro for converting multiple Word documents to HTML format:

Sub macro1()
Dim objCC As ContentControl
Dim dataline As String
Dim doc As Document
Open "D:\Data\find1" For Input As #1
While Not eof(1)
    Line Input #1, dataline
    Debug.Print dataline
    Set doc = Documents.Open(dataline)
        Do While doc.ContentControls.Count > 0
            For Each objCC In doc.ContentControls
                objCC.Delete False
            Next
        Loop
    doc.SaveAs ActiveDocument.Path + "/" + ActiveDocument.Name + ".html", wdFormatHTML
    doc.Close
Wend
Close #1
End Sub

The list of documents is in the file find1. I have about 20000 documents, but after converting of about 1000 files the memory is full, and the system hangs. Is there a way to avoid it?

braX
  • 11,506
  • 5
  • 20
  • 33
Michael
  • 5,095
  • 2
  • 13
  • 35
  • 1
    Probably You need `Set doc = nothing` after `doc.Close` – Teamothy Nov 11 '19 at 12:50
  • What happens if you comment out certain functional parts from the loop (the `Do While` loop, the `.SaveAs`)? Independently of that, I would cut out the entire `For Each` loop in favor of just `doc.ContentControls(doc.ContentControls.Count - 1).Delete`. – Tomalak Nov 11 '19 at 12:51
  • 1
    This sounds very similar to an issue I have encountered in Excel: When you Open the document, and then Close it again, Office does not free all of the memory. The only way to clear it is to completely close Excel, and then open it again. (You *might* be able to squeeze slightly more memory out by opening the documents as Read-Only?) – Chronocidal Nov 11 '19 at 12:55
  • @Chronocidal but if open the file as read-only I'll be not able to delete ContentControls in the inner loop? – Michael Nov 11 '19 at 13:02
  • @CindyMeister No, after inserting `set doc=Nothing` and launching the used memory is increasing linearly about 10MB/s – Michael Nov 11 '19 at 18:34
  • Interesting, that setting to Nothing shows that behavior... You did put it just before `Wend`? FWIW I also agree with Tomalak that there's no need for *both* the `Do`- loop and the `For...Each`. And I'd also use `Set objCC = Nothing` a line before setting doc to Nothing. How about trying a `Sleep` or `DoEvents`, does that make any difference. FWIW *possibly* saving to a different file name is still holding the original doc file in memory, as well and Word needs a chance to release both files (html and doc). Also, check the folder for "temp" files now, as well as when things stop again. – Cindy Meister Nov 11 '19 at 18:42
  • @CindyMeister `Set doc=Nothing` will have no effect mid-loop; after `doc.close`, you loop straight back around to `Set doc = Documents.Open(dataline)`. It *might* have a small effect once you finish the loop (and have processed all 20,000 files), but we haven't reached that point yet. – Chronocidal Nov 12 '19 at 08:44

2 Answers2

1

As I have mentioned in comments, this memory leak is to do with how Office opens/closes files - if you open and then close a file, even without changing or saving it, some data is left in memory and cannot be dumped without closing the Application.

I suspect (but cannot confirm) that it originates from some sort of "feature" to make reopening files slightly faster.

Now - as I said earlier - you can free that memory by closing the Application, so, that is what we'll do! If we Late-Bind Word to a different Office Application (Excel / Powerpoint / Outlook), we can then close and reopen it mid-macro

Sub macro1()
    Dim objCC As Object 'Late Binding, must be Object
    Dim dataline As String
    Dim doc As Object 'Late Binding, must be Object

    Dim lineCounter AS Long: lineCounter = 0 'So that we can keep track of files!
    Dim MSWord AS Object 'Late Binding, must be Object
    Set MSWord = CreateObject("Word.Application") 'Create an instance of Word
    'MSWord.Visible=True 'OPTIONAL LINE!  Makes Word visible, default is False

    Open "D:\Data\find1" For Input As #1
    While Not eof(1)
        Line Input #1, dataline
        Debug.Print dataline

        Set doc = MSWord.Documents.Open(dataline) 'Open with the correct Application

        Do While doc.ContentControls.Count > 0
            For Each objCC In doc.ContentControls
                objCC.Delete False
            Next objCC
        Loop

        doc.SaveAs MSWord.ActiveDocument.Path + "/" + MSWord.ActiveDocument.Name + ".html", wdFormatHTML
        doc.Close

        lineCounter = lineCounter +1 'Count processed documents
        If (lineCounter mod 100) = 0 Then 'Every 100 documents - adjust as necessary
            'We need to destroy any objects associated with Word to close it safely
            Set objCC = Nothing
            Set doc = Nothing

            MSWord.Quit 'Close Word, to free the junk memory
            DoEvents 'Check in with Windows - we haven't crashed, honest!
            Set MSWord = CreateObject("Word.Application") 'Create a new instance of Word
            'MSWord.Visible=True 'OPTIONAL LINE!  Makes Word visible, default is False
        End If
    Wend
    Close #1
    Set objCC = Nothing
    Set doc = Nothing
    MSWord.Quit 'Close Word, for the final time
End Sub
Chronocidal
  • 6,827
  • 1
  • 12
  • 26
-1

Maybe You should try to work with FilesystemObject, e.g.

Set objFSO  = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(fileSpec, ForReading)
strContents = objFile.ReadAll    
' and so on

objFile.Close
barneyos
  • 586
  • 2
  • 5
  • 7
  • 1
    This suggestion makes no sense in the context of the question, as written. Can you explain why you think this is an answer to the problem? – Cindy Meister Nov 11 '19 at 17:16
  • Maybe Open method makes memory leak. It was loose proposal after reading [this](https://stackoverflow.com/questions/1376756/what-is-a-superfast-way-to-read-large-files-line-by-line-in-vba) – barneyos Nov 12 '19 at 07:38