My organization sometimes needs to use Excel to generate a bunch of formatted statements (in the sense of a document stating "your account balance is $X"), print them to PDF, and then combine them into one big PDF. The method generally used involves a single sheet driven by an index cell and a list of people/data on another sheet. A VBA macro iterates the index cell from 1 to N, then uses the Adobe Distiller API to print the formatted sheet each time and combine the results.
For various reasons I would like to implement most of this macro's logic in C# in our VSTO Excel Add-In, so that the VBA side of the process would be reduced to a few lines.
I decided to expose an API that looks roughly like this:
AcroPDDoc PdfBegin(Worksheet worksheet, string filename);
void PdfAddPage(AcroPDDoc pdf, Worksheet worksheet);
void PdfComplete(AcroPDDoc pdf);
The idea being you write VBA of the form:
Sub PrintToPdf()
Dim obj As IMySharedObject
Set obj = Application.COMAddIns("MyAddIn").Object
Dim pdf As Acrobat.AcroPDDoc
Dim i As Long
For i = 1 To 10
Range("counter").Value = i
If i = 1 Then
Set pdf = obj.PdfBegin(Sheets("Statement"), "C:\myFile.pdf")
Else
PdfAddPage pdf, Sheets("Statement")
End If
Next i
PdfComplete pdf
End Sub
I grew curious/worried about the lifecycle of the AcroPDDoc
object and about open file handles, Acrobat.exe processes etc. in the event that the macro hit an error or was terminated mid-execution. Not super worried because "close Excel and reopen it" is an acceptable solution if necessary. I wrote the following code in C#:
internal static class Printing
{
private static WeakReference weakref;
public static AcroPDDoc PdfBegin(Worksheet worksheet, string filename)
{
SetAdobeOutputFile(filename);
worksheet.PrintOut(ActivePrinter: "Adobe PDF");
AcroPDDoc pdf = new AcroPDDoc();
pdf.Open(filename);
weakref = new WeakReference(pdf);
return pdf;
}
public static void GC()
{
System.GC.Collect();
}
public static void test(AcroPDDoc pdf)
{
if (weakref != null) {
System.Diagnostics.Debug.WriteLine("IsAlive pre: " + weakref.IsAlive);
if (weakref.IsAlive) System.Diagnostics.Debug.WriteLine("ReferenceEquals: " + Object.ReferenceEquals(pdf, weakref.Target));
}
GC.Collect();
if (weakref != null) System.Diagnostics.Debug.WriteLine("IsAlive post: " + weakref.IsAlive);
}
}
I've elided a bunch of extra Debug.WriteLine
s and some other extraneous code. I tested it with the following VBA:
Sub foo()
Dim obj As IUDFSharedObject
Set obj = Application.COMAddIns("MyAddIn").Object
Dim pdf As Acrobat.AcroPDDoc
Set pdf = obj.PdfBegin(Sheets("Statement"), "C:\myFile.pdf")
'obj.GC
'obj.test pdf
End Sub
What I found generally was that .NET does not include the reference sent off to VBA-land in its reference count for garbage collection.
For example, if I uncomment just obj.GC
and obj.test pdf
, I am informed that weakref
is not alive.
However, if I uncomment only obj.test pdf
, weakref
is alive both before and after (and I emit "ReferenceEquals: true").
Note that pdf
is in scope in VBA the entire time. I originally tested to see what happens if you let pdf
escape VBA scope too, but it turns out not to matter.
This is a much, much bigger problem for me than a resource link. Is there any solution short of permanently storing each AcroPDDoc
object generated in a List
somewhere to keep the reference count above zero?