0

I am trying to have my Excel VBA code read an interactive PDF (i.e. a PDF with dropdowns, calendars...etc.). I normally read PDF files by opening them in Word; however, Word removes all the interactive controls while rendering the file (e.g. a calendar control gets completely removed...along with its value).

I know that I can "flatten" the PDF by opening it in Adobe Acrobat Reader (or any browser) and printing it to a PDF (thus replacing all the controls with their selected values). The question is how do I do this programmatically in VBA. I know there are thousands of questions on S.O. about this topic; but, all of the answers either require installing Adobe Acrobat Pro, using a third-party installed application (e.g. CutePDF), submitting the file to an online API, or most commonly, using SendKeys or Win32 API calls to interact with the "Save As" dialog box that appears. The code I am writing will be distributed to multiple users; so, any additional application installations is out of the question (both Acrobat Pro and other third-party conversion tools). The files contain proprietary data; so, an online API is out as well. As far as SendKeys or SendMessage is concerned...frankly, there's got to be a better way.

I've tried numerous methods including Chrome's headless "print-to-pdf"; but it appears that only accepts HTML files (I even tried embedding the PDF into an HTML file but it still did not work). The closest I've gotten is by using the below code to send data directly to the "Microsoft Print to PDF" driver (it's based on the process described here). The code successfully creates a PDF file...but it's a zero byte file. It seems that the data read from the input file is not being properly accepted by the "WritePrinter" function (although no error occurs). Although the documentation of the "WritePrinter" function states that the "pBuf" parameter should be a byte array, the person who asked this question was passing a string and getting it to work (understanding that they were not trying to print to a GDI printer). My code still just produces a blank PDF file if I convert the byte array to a string or even just read the contents of the input file using FSO's "ReadAll" method. It is also worth nothing that the return value of the "WritePrinter" function is "1" and the value of the "pcWritten" output is the correct number of bytes...it's just that the PDF file that is produced has a file size of 0 bytes.

So, can anyone figure out how to get the "WritePrinter" function to accept the data being read from the input file?

By the way, an enormous gold star to anyone who can figure this out because, based on my research, the internet is begging for a way to do this without using Acrobat Pro or having to interact with the "Save As" dialog!

UPDATE#1: I have found a few posts online where users are experiencing this issue of the "Microsoft Print to PDF" driver generating blank files manually (here and here). Apparently the primary culprit is special characters in the input file name. I wanted to make it clear that I do not believe this is the issue I am having, as I can print to PDF perfectly fine manually...just not via the code in this post. (Also, just so it's been said, neither the input file path/name nor output file path/name contains special characters). Also, I don't believe it's an issue with the specific install of the print driver (as suggested in some posts) as my code creates 0 byte files on 3 different computers with 3 different OS builds of Windows (18363.1082, 18363.1139, and 19041.572)

UPDATE#2: In my continued research of this issue I found this post on MSDN's Visual Studio help forums. I understand it's for C# but one of the contributers states:

So you should convert managed byte array into unmanaged array, then invoke the method

He provides C# code that uses the "Marshal.AllocCoTaskMem" and "Marshal.Copy" functions to "Copy the managed byte array into the unmanaged array". I'm not familiar with the terms "managed" or "unmanaged" byte arrays so I will continue to do some research on those. Does anyone have any experience with the "Marshal.AllocCoTaskMem" and "Marshal.Copy" functions from VBA within Excel (VB6)?

UPDATE#3: It has been brought to my attention that the code I've written will ONLY print XPS files to a PDF. I've converted my original interactive PDF to an XPS manually and confirmed that my code worked perfectly in writing that XPS to a PDF. This now leaves me back at square one: how do I programmatically read an interactive PDF without utilizing any third-party applications or online converters? Anyone have any ideas?


Type DOCINFO
   lpszDocName As String
   lpszOutput As String
   lpszDatatype As String
End Type

Private Declare PtrSafe Function OpenPrinter Lib "winspool.drv" Alias "OpenPrinterA" (ByVal pPrinterName As String, phPrinter As Long, ByVal pDefault As Long) As Long
Private Declare PtrSafe Function StartDocPrinter Lib "winspool.drv" Alias "StartDocPrinterA" (ByVal hPrinter As Long, ByVal Level As Long, pDocInfo As DOCINFO) As Long
Private Declare PtrSafe Function StartPagePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function WritePrinter Lib "winspool.drv" (ByVal hPrinter As Long, pBuf As Any, ByVal cdBuf As Long, pcWritten As Long) As Long
Private Declare PtrSafe Function EndPagePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function EndDocPrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long
Private Declare PtrSafe Function ClosePrinter Lib "winspool.drv" (ByVal hPrinter As Long) As Long

Public Sub Print_File_To_PDF(strInputFile As String, strOutputPDF As String)
   Dim udtDocInfo As DOCINFO
   Dim lngPrinterCheck As Long, lngFileNumber As Long, lngPrinterHandle As Long, lngPrinterOutput as Long
   Dim arrBytes() As Byte

   'Get handle of printer
   lngPrinterCheck = OpenPrinter("Microsoft Print To PDF", lngPrinterHandle, 0)
   If lngPrinterCheck = 0 Then
       Exit Sub
   End If

   'Define document info
   With udtDocInfo
       .lpszDocName = CreateObject("Scripting.FileSystemObject").GetBaseName(strOutputPDF)
       .lpszOutput = strOutputPDF
       .lpszDatatype = "RAW"
   End With

   'Read file into byte array
   lngFileNumber = FreeFile
   Open strInputFile For Binary Access Read As lngFileNumber
   ReDim arrBytes(LOF(lngFileNumber))
   Get lngFileNumber, , arrBytes()
   Close lngFileNumber

   'Print byte array to PDF file
   Call StartDocPrinter(lngPrinterHandle, 1, udtDocInfo)
   Call StartPagePrinter(lngPrinterHandle)
   Call WritePrinter(lngPrinterHandle, arrBytes(1), UBound(arrBytes), lngPrinterOutput)
   Call EndPagePrinter(lngPrinterHandle)
   Call EndDocPrinter(lngPrinterHandle)
   Call ClosePrinter(lngPrinterHandle)
End Sub
  • What's the return value of `WritePrinter()`? And can you print other types of file successfully? – Rita Han Oct 23 '20 at 02:12
  • The return value of WritePrinter (as well as all the other Win32 API functions) is >=1. I have tried it with Word files (.docx) and txt files with the same results. I have also tried changing the value of the "lpszDatatype" property of the udtDocInfo from "RAW" to vbNullString and that did not work either. – AskingAQuestion Oct 23 '20 at 13:04
  • Another thing I should mention is that the pcWritten output of the WritePrinter function does return the correct number of bytes...but for some reason the PDF file that is actually written by the function has a file size of 0. So, to recap: the return value of WritePrinter is "1" (indicating success) and the pcWritten output is the correct number of bytes but the PDF that is produced has a file size of 0 bytes. – AskingAQuestion Oct 23 '20 at 13:18
  • The `WritePrinter` call looks awkward. The final argument should be a valid pointer. The value `0` is not. – IInspectable Oct 25 '20 at 21:12
  • Understood; although it makes no difference. I edited my post to have a valid pointer as the last parameter of the "WritePrinter" function. The value of the newly added "lngPrinterOutput" pointer is the correct number of bytes; however, the PDF file that is produced is still a 0 byte file and both Acrobat Reader and Chrome state that it's a "corrupted or empty" file (obviously empty in this case since it's file size is 0 bytes). – AskingAQuestion Oct 25 '20 at 22:32
  • When you use the “Microsoft Print to PDF” driver the following is happening: PDF->EMF->XPS->PDF. Convert a dynamic PDF to a flat PDF file just go from PDF -> PDF so using “Microsoft Print to PDF” doesn’t make any sense. Printing a PDF file in PDF format seems not supported. – Rita Han Oct 29 '20 at 06:03
  • @RitaHan-MSFT: I wonder why, if PDF->PDF is truly unsupported using the "Microsoft Print to PDF" driver, it works "manually" when I use it from either Chrome or Acrobat Reader. I have confirmed that by selecting "Microsoft Print to PDF" from those two programs and specifying an output location in the "Save As" dialog that it does, in fact, "flatten" the PDF. I also want to make it clear that my code still produces a 0 byte file even if the input file is a .txt or .docx file; so, it's something about my code that's causing the problem...not the fact that it's a PDF being printed. – AskingAQuestion Oct 29 '20 at 16:49
  • We are doing more investigation on this issue and will update here if there is any progress. – Rita Han Nov 03 '20 at 01:40
  • @AskingAQuestion, To print something, we have to create a file that contains instructions for a printer. That is the job of the printer driver of the printer. For example, when you print text from Notepad.exe, it uses the Windows Printing APIs. The API calls are translated into driver calls where the printer driver creates such a file. Then, it will be sent to the printer. What WritePrinter() does is send such a file to the printer directly. You can't simply send a text file or any file to a printer. – Fei Xue Nov 06 '20 at 02:06
  • If your printer is able to handle PDF files directly though, you can send the file using WritePrinter(). Microsoft Print to PDF printer is an XPS printer, XPS printers require a file that is in XPS format. – Fei Xue Nov 06 '20 at 02:07
  • @FeiXue-MSFT Firstly, thank you very much for looking into this for me. You are, of course, correct. I converted the PDF to an XPS and my code worked perfectly. So, now, I am back at square one. How can I programmatically read an interactive PDF without using a third-party application or online converter? If I could flatten the file then I could use Word to read it; but, again, the interactive controls of the file make it so that I cannot read it right now. – AskingAQuestion Nov 09 '20 at 15:21
  • @AskingAQuestion, Based on my understanding, there is no such win32 API could achieve this. You may need to consider to use the application(make sure the application support automation) to convert for this scenario, and set-up a service if it needs to supports multiple users. – Fei Xue Nov 10 '20 at 03:08

0 Answers0