3

My organization sometimes needs to use Excel to generate a bunch of formatted statements (in the sense of a document stating "your account balance is $X"), print them to PDF, and then combine them into one big PDF. The method generally used involves a single sheet driven by an index cell and a list of people/data on another sheet. A VBA macro iterates the index cell from 1 to N, then uses the Adobe Distiller API to print the formatted sheet each time and combine the results.

For various reasons I would like to implement most of this macro's logic in C# in our VSTO Excel Add-In, so that the VBA side of the process would be reduced to a few lines.

I decided to expose an API that looks roughly like this:

AcroPDDoc PdfBegin(Worksheet worksheet, string filename);
void PdfAddPage(AcroPDDoc pdf, Worksheet worksheet);
void PdfComplete(AcroPDDoc pdf);

The idea being you write VBA of the form:

Sub PrintToPdf()
    Dim obj As IMySharedObject
    Set obj = Application.COMAddIns("MyAddIn").Object

    Dim pdf As Acrobat.AcroPDDoc

    Dim i As Long
    For i = 1 To 10
        Range("counter").Value = i

        If i = 1 Then
            Set pdf = obj.PdfBegin(Sheets("Statement"), "C:\myFile.pdf")
        Else
            PdfAddPage pdf, Sheets("Statement")
        End If
    Next i

    PdfComplete pdf
End Sub

I grew curious/worried about the lifecycle of the AcroPDDoc object and about open file handles, Acrobat.exe processes etc. in the event that the macro hit an error or was terminated mid-execution. Not super worried because "close Excel and reopen it" is an acceptable solution if necessary. I wrote the following code in C#:

internal static class Printing
{
    private static WeakReference weakref;

    public static AcroPDDoc PdfBegin(Worksheet worksheet, string filename)
    {
        SetAdobeOutputFile(filename);
        worksheet.PrintOut(ActivePrinter: "Adobe PDF");

        AcroPDDoc pdf = new AcroPDDoc();
        pdf.Open(filename);
        weakref = new WeakReference(pdf);

        return pdf;
    }

    public static void GC()
    {
        System.GC.Collect();
    }

    public static void test(AcroPDDoc pdf)
    {
        if (weakref != null) {
            System.Diagnostics.Debug.WriteLine("IsAlive pre: " + weakref.IsAlive);
            if (weakref.IsAlive) System.Diagnostics.Debug.WriteLine("ReferenceEquals: " + Object.ReferenceEquals(pdf, weakref.Target));
        }

        GC.Collect();

        if (weakref != null) System.Diagnostics.Debug.WriteLine("IsAlive post: " + weakref.IsAlive);
    }
}

I've elided a bunch of extra Debug.WriteLines and some other extraneous code. I tested it with the following VBA:

Sub foo()
    Dim obj As IUDFSharedObject
    Set obj = Application.COMAddIns("MyAddIn").Object

    Dim pdf As Acrobat.AcroPDDoc
    Set pdf = obj.PdfBegin(Sheets("Statement"), "C:\myFile.pdf")
    'obj.GC
    'obj.test pdf
End Sub

What I found generally was that .NET does not include the reference sent off to VBA-land in its reference count for garbage collection.

For example, if I uncomment just obj.GC and obj.test pdf, I am informed that weakref is not alive.

However, if I uncomment only obj.test pdf, weakref is alive both before and after (and I emit "ReferenceEquals: true").

Note that pdf is in scope in VBA the entire time. I originally tested to see what happens if you let pdf escape VBA scope too, but it turns out not to matter.

This is a much, much bigger problem for me than a resource link. Is there any solution short of permanently storing each AcroPDDoc object generated in a List somewhere to keep the reference count above zero?

Alex Godofsky
  • 708
  • 4
  • 16
  • 1
    ++ for the well formed question. One question might be why not eliminate the usage of VBA completely and rely only on the .NET calls. IE. you could initialize and set reference to a worksheet/cell in your .NET add-in and call the PDF API from .NET? –  Oct 10 '14 at 06:59
  • In fact, I am going to expose such a function (although I didn't mention it above). The reason I need these functions too is that I already know of a bunch of cases where the logic doesn't precisely confirm to this pattern - e.g. some indexes are skipped, or there's additional logic to decide which worksheet to print out for each index. – Alex Godofsky Oct 10 '14 at 13:27

2 Answers2

2

Thanks to @yms above I've figured out what is going on and have come up with a solution I'm fairly happy with. First, the slight modification to the API:

void PdfBegin(AcroPDDoc pdf, Worksheet worksheet, string filename);
void PdfAddPage(AcroPDDoc pdf, Worksheet worksheet);
void PdfComplete(AcroPDDoc pdf);

Each C# method will call Mashal.ReleaseComObject(pdf) before returning. I did read Marshal.ReleaseComObject considered dangerous, but I've tested for the particular failure mode he calls out and found it does not appear to occur in practice.

The VBA now has to supply the AcroPDDoc object from the beginning. Thus the typical usage will look like:

Sub PrintToPdf()
    Dim obj As IMySharedObject
    Set obj = Application.COMAddIns("MyAddIn").Object

    Dim pdf As New AcroPDDoc

    Dim i As Long
    For i = 1 To 10
        Range("counter").Value = i

        If i = 1 Then
            obj.PdfBegin pdf, Sheets("Statement"), "C:\myFile.pdf"
        Else
            obj.PdfAddPage pdf, Sheets("Statement")
        End If
    Next i

    obj.PdfComplete pdf
End Sub

Essentially just that the declaration is now As New AcroPDDoc rather than As AcroPDDoc with a later Set.

Testing revealed that VBA is very prompt about decrementing the refcount of the AcroPDDoc as soon as it falls out of scope or the reference is set to Nothing. This includes cases where an error is raised mid-subroutine and the user ends execution.

Finally, the Acrobat.exe process is also prompt about killing itself off as soon as its refcount reaches zero, even if it has a file open.

Alex Godofsky
  • 708
  • 4
  • 16
1

Note that your references to AcroPDDoc in .Net are actually wrappers to an out of process COM object that you are passing through different ecosystems, .Net framework does not fully control the lifetime of the underlying object, the reference counting is controlled by the COM server, and as long as there is a COM reference to one object, either from VBA or from .Net, the object will stay alive.

I believe you will find this question and its answers interesting: RCW & reference counting when using COM interop in C#

Community
  • 1
  • 1
yms
  • 10,361
  • 3
  • 38
  • 68
  • This is actually something I'm a bit confused about. I'm aware generally of how COM objects work in .Net and knew about the wrappers. What I half-expected was that my attempt to return an `Acrobat.AcroPDDoc` to VBA-land would fail because the object returned was actually a .Net wrapper. But I am able to use the returned object just fine - for example, I can call `pdf.GetFileName()` in VBA and it behaves as expected. So I thought maybe the COM object is un-wrapped before being sent to VBA, but as you can see in test2 the object remains reference-equal after the C# -> VBA -> C# round-trip. – Alex Godofsky Oct 10 '14 at 01:02
  • I think you are counting the references to the wrapper, not to the COM object itself – yms Oct 10 '14 at 01:09
  • (because I hit the comment length limit above) the link you gave, like the other things I found searching before I put up this question, deals with "what happens to a COM object when it goes to .Net". I am trying to figure out what happens to an object going in the opposite direction, from .Net to COM. Normally for an object to do that at all, I would have to decorate its class with `[ComVisible]` and register it and so on, but that COM registration has already been done by Acrobat. – Alex Godofsky Oct 10 '14 at 01:10
  • re: counting references to the wrapper, that's something I've been thinking about but it doesn't fully make sense to me. The `ReferenceEquals` test makes me pretty sure that literally the same pointer is being passed into VBA and back out. Which would mean that VBA is able to successfully call `GetFileName()` believing that the pointer it has is a pointer to a regular `Acrobat.AcroPDDoc` object, not a .Net wrapper object. – Alex Godofsky Oct 10 '14 at 01:15
  • OK, reading through your link, and everything it linked to, and googling around some more, I think I've figured it out. When a COM object is passed to C#, .NET actually runs through some master list of RCWs it has to see if any of them already point to that object. If one does, C# is handed that RCW, otherwise a new one is created. Now things make a lot more sense. – Alex Godofsky Oct 10 '14 at 02:01
  • @Alex Godofsky Maybe you should write a new answer with your findings – yms Oct 10 '14 at 13:24