I'm trying to create a script to convert PDF to plain text, then copy the plain text into Word. (We do a lot of reformatting corrupt documents from scratch where I work.) I have a script that's working perfectly except for one thing: when pasting into Word, it doesn't paste the whole file. With longer files, I'm only getting part of the text.
'string to hold file path
Dim strDMM
strDMM = "[path]"
'make this directory if it doesn't exits
On Error Resume Next
MkDir strDMM
On Error GoTo 0
'get the file name to process
Dim TheFile
TheFile = InputBox("What is the file name?" & chr(13) & chr(13) & "(Example: [name].pdf)", "Name of File")
'declare some acrobat variables
Dim AcroXApp
Dim AcroXAVDoc
Dim AcroXPDDoc
'open acrobat
Set AcroXApp = CreateObject("AcroExch.App")
AcroXApp.Hide
'open the document we want
Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open "[path to desktop]" & TheFile, "Acrobat" 'users are instructed to save to the Desktop for ease of access here
'make sure the acrobat window is active
AcroXAVDoc.BringToFront
'I don't know what this does. I copied it from code online.
Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
'activate JavaScript commands w/Acrobat
Dim jsObj
Set jsObj = AcroXPDDoc.GetJSObject
'save the file as plain text
jsObj.SaveAs strDMM & "pdf-plain-text.txt", "com.adobe.acrobat.plain-text"
'close the file and exit acrobat
AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit
'declare constants for manipulating the text files
Const ForReading = 1
Const ForWriting = 2
'Create a File System Object
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
'read file and get text
dim objFile
set objFile=objFSO.OpenTextFile( strDMM & TheFile, ForReading)
Dim strText
strText=objFile.ReadAll
'Create a Word Object
Dim objWord
set objWord = CreateObject("Word.Application")
'make Word visible
With objWord
.Visible = True
End With
'Add method used to create a blank document
Dim objDoc
Set objDoc=objWord.Documents.Add()
'create a shorter variable to pass commands to Word
Dim objSelection
set objSelection=objWord.Selection
'type the read text into Word; this is the part that's failing
objSelection.TypeText strText
objFile.Close
I've tried multiple files with the same result. The funny thing is, it pastes the same material from file A each time, but when copying from file B, it pastes a different amount of material. In other words, if A gives me 8 pages of 60 on the first run, I get those same 8 pages each time. File B might give me 14 pages of 60, then it gives me the same 14 pages each time. This only changes if I delete material from the .txt file. If I delete several paragraphs from A, then run the script, I might get 12 pages. Then I get those same 12 every time. But there's no pattern (that I can discern) to predict where it cuts off.
I can't find any EOF characters, and when I read from notepad and write to notepad, the whole thing is copied perfectly. The problem is somewhere in the transfer to Word.
Is there something I'm missing? Is there a limit to the size of a string that Word can write with TypeText? (I would think that if that were the case, I wouldn't get documents of varying length, right? Shouldn't they all stop at n characters if that's the limit?)
I've read about additional libraries that let VBS work with the clipboard, but I'm a total noob and don't know if that's a more elegant solution or how to make it work. I'm also not sure that on my work computer I have the necessary access to install those libraries.
Any help is appreciated!