11

I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.

Anyway, here it is the code which I found here:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path

   Do While (file <> "")

   ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

          Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

    ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

    ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory, which seemed not to be defined also.

Update 27/08/2017

I changed the code to the following:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf")

   ChDir "C:\Users\username\work_dir_example"

   Do While (file <> "")

        Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

        ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?

John Doe
  • 212
  • 1
  • 9
  • 28
  • 2
    (a) Is the `Dir("C:\Users\...t" & "*.pdf")` implying that your directory ends with a `t`? If so, that should say `Dir("C:\Users\...t\" & "*.pdf")` (or, to save a tiny bit of processing time, `Dir("C:\Users\...t\*.pdf")`). (b) I'm not sure why the `ChangeFileOpenDirectory` would fail, other than perhaps the directory you specified didn't exist or you didn't have access to it. – YowE3K Aug 26 '17 at 02:09
  • 1
    just delete the two `ChangeFileOpenDirectory ...` lines. open and save files using a full path – jsotola Aug 26 '17 at 03:48
  • I tried some of the suggestions. I'll update the question. – John Doe Aug 27 '17 at 23:52
  • Also, I changed the directory path to `C:\Users\username\work_dir_example` to avoid misunderstandings. – John Doe Aug 28 '17 at 00:01
  • 1
    I know you requested VBA, but as you mentioned you are opened to alternatives: if you have access to a *nix machine and LibreOffice, you could give this a try: https://stackoverflow.com/questions/26358281/convert-pdf-to-doc-python-bash/26358582#26358582 – kevdoran Aug 28 '17 at 14:19
  • I will consider it. But that is a bit of long-shot, since I need to install python (right?) -- which I am a bit familiar with but not completely comfortable using it. – John Doe Aug 28 '17 at 15:06
  • As I said in my first comment, `Dir("C:\Users\username\work_dir_example" & "*.pdf")` should be `Dir("C:\Users\username\work_dir_example\*.pdf")` (or `Dir("C:\Users\username\work_dir_example\" & "*.pdf")`, which is the same thing) Without the `"\"` you are looking for `.pdf` files in the `"C:\Users\username"` directory which have filenames starting with `"work_dir_example"`. – YowE3K Aug 29 '17 at 01:51
  • After fixing the `"\"` typo you say in a comment to my answer "Now the error is `Run Time Error '424' object required` and it occurs in the first command inside the while loop, the `Documents.Open`". The only object that is required in that line is `Documents`, which should be part of the VBA library. That, coupled with your mention of "I got an error related to the function `ChangeFileOpenDirectory`, which seemed not to be defined also" in the question, makes me think your MSWord installation may have been corrupted. – YowE3K Sep 01 '17 at 18:57

2 Answers2

9

Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.

The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).

You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.

Now for the code!

As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.

I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).

Main procedure:

Sub ConvertPDFsToWord2()
    Dim path As String
    'Manually edit path in the next line before running
    path = "C:\users\username\work_dir_example\"

    Dim file As String
    Dim doc As Word.Document
    Dim regValPDF As Integer
    Dim originalAlertLevel As WdAlertLevel

'Generate string for getting all PDFs with Dir command
    'Check for terminal \
    If Right(path, 1) <> "\" Then path = path & "\"
    'Append file type with wildcard
    file = path & "*.pdf"

    'Get path for first PDF (blank string if no PDFs exist)
    file = Dir(file)

    originalAlertLevel = Application.DisplayAlerts
    Application.DisplayAlerts = wdAlertsNone

    If file <> "" Then regValPDF = TogglePDFWarning(1)

    Do While file <> ""
        'Open method will automatically convert PDF for editing
        Set doc = Documents.Open(path & file, False)

        'Save and close document
        doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
                    fileformat:=wdFormatDocumentDefault
        doc.Close False

        'Get path for next PDF (blank string if no PDFs remain)
        file = Dir
    Loop

CleanUp:
    On Error Resume Next 'Ignore errors during cleanup
    doc.Close False
    'Restore registry value, if necessary
    If regValPDF <> 1 Then TogglePDFWarning regValPDF
    Application.DisplayAlerts = originalAlertLevel

End Sub

Registry setting function:

Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
    Dim wShell As Object
    Dim regKey As String
    Dim regVal As Variant

    'setup shell object and string for key
    Set wShell = CreateObject("WScript.Shell")
    regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
             Application.Version & "\Word\Options\"

    'Get existing registry value, if any
    On Error Resume Next 'Ignore error if reg value does not exist
    regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
    On Error GoTo 0      'Break on errors after this point

    wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"

    'Return original setting / registry value (0 if omitted)
    If Err.Number <> 0 Or regVal = 0 Then
        TogglePDFWarning = 0
    Else
        TogglePDFWarning = 1
    End If

End Function
AjimOthy
  • 701
  • 5
  • 13
  • I have problem with the `TogglePDFWarning` function. Do I insert this in another module? Do I need a library to properly call it? – John Doe Sep 20 '17 at 14:52
  • I`'ve solved the problem. I needed to enter the private function as a procedure. Since you've managed to obtain the answer -- and I could't check if you were right in due time --, is there a way to 'donate' the additional 25 points for you? – John Doe Sep 20 '17 at 15:20
  • While running your code I get an error message on the line `Set doc = Documents.Open(path & file, False)` which says >runtime error '-2147221164 (80040154)': class not defined< What am I doing wrong? – Capt.Krusty Jun 24 '21 at 07:31
  • 1
    @Capt.Krusty It's not finding the Documents class, which probably means there's an issue with either your reference to the Word library or the related DLL file. I'd try walking through the steps again in a fresh DOCM file first. If that doesn't work, you're in for some troubleshooting outside the code. :( – AjimOthy Jun 26 '21 at 00:07
4

As others have stated, the problem seems to lie mostly with the path & file name. Here is the second version of the code you posted with some changes.

Unfortunately, a warning message pops up and setting DisplayAlerts to false will not suppress it. But if you click the "don't show this message again" checkbox the first time it pops up, then it will not continue to pop up for every file.

Sub convertToWord()

    Dim MyObj       As Object
    Dim MySource    As Object
    Dim file        As String
    Dim path        As String

    path = "C:\Users\username\work_dir_example\"
    file = Dir(path & "*.pdf")

    Do While (file <> "")
        Documents.Open FileName:=path & file
        With ActiveDocument
            .SaveAs2 FileName:=Replace(path & file, ".pdf", ".docx"), _
                                FileFormat:=wdFormatXMLDocument
            .Close
        End With
        file = Dir
    Loop

End Sub
J. Garth
  • 783
  • 6
  • 10