1

I'm trying to create a script to convert PDF to plain text, then copy the plain text into Word. (We do a lot of reformatting corrupt documents from scratch where I work.) I have a script that's working perfectly except for one thing: when pasting into Word, it doesn't paste the whole file. With longer files, I'm only getting part of the text.

'string to hold file path
Dim strDMM
strDMM = "[path]"

'make this directory if it doesn't exits
On Error Resume Next
MkDir strDMM
On Error GoTo 0

'get the file name to process
Dim TheFile
TheFile = InputBox("What is the file name?" & chr(13) & chr(13) & "(Example: [name].pdf)", "Name of File")

'declare some acrobat variables
Dim AcroXApp
Dim AcroXAVDoc
Dim AcroXPDDoc

'open acrobat
Set AcroXApp = CreateObject("AcroExch.App")
AcroXApp.Hide

'open the document we want
Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open "[path to desktop]" & TheFile, "Acrobat" 'users are instructed to save to the Desktop for ease of access here

'make sure the acrobat window is active
AcroXAVDoc.BringToFront

'I don't know what this does. I copied it from code online.
Set AcroXPDDoc = AcroXAVDoc.GetPDDoc

'activate JavaScript commands w/Acrobat
Dim jsObj
Set jsObj = AcroXPDDoc.GetJSObject

'save the file as plain text
jsObj.SaveAs strDMM & "pdf-plain-text.txt", "com.adobe.acrobat.plain-text"

'close the file and exit acrobat
AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit

'declare constants for manipulating the text files
Const ForReading = 1
Const ForWriting = 2

'Create a File System Object
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")

'read file and get text
dim objFile
set objFile=objFSO.OpenTextFile( strDMM & TheFile, ForReading)

Dim strText
strText=objFile.ReadAll

'Create a Word Object
Dim objWord
set objWord = CreateObject("Word.Application")

'make Word visible
With objWord
    .Visible = True
End With

'Add method used to create a blank document
Dim objDoc
Set objDoc=objWord.Documents.Add()

'create a shorter variable to pass commands to Word
Dim objSelection
set objSelection=objWord.Selection

'type the read text into Word; this is the part that's failing
objSelection.TypeText strText

objFile.Close

I've tried multiple files with the same result. The funny thing is, it pastes the same material from file A each time, but when copying from file B, it pastes a different amount of material. In other words, if A gives me 8 pages of 60 on the first run, I get those same 8 pages each time. File B might give me 14 pages of 60, then it gives me the same 14 pages each time. This only changes if I delete material from the .txt file. If I delete several paragraphs from A, then run the script, I might get 12 pages. Then I get those same 12 every time. But there's no pattern (that I can discern) to predict where it cuts off.

I can't find any EOF characters, and when I read from notepad and write to notepad, the whole thing is copied perfectly. The problem is somewhere in the transfer to Word.

Is there something I'm missing? Is there a limit to the size of a string that Word can write with TypeText? (I would think that if that were the case, I wouldn't get documents of varying length, right? Shouldn't they all stop at n characters if that's the limit?)

I've read about additional libraries that let VBS work with the clipboard, but I'm a total noob and don't know if that's a more elegant solution or how to make it work. I'm also not sure that on my work computer I have the necessary access to install those libraries.

Any help is appreciated!

tmoore82
  • 1,857
  • 1
  • 27
  • 48

2 Answers2

4

There is no need to read a file into Word, you can insert a text file from disk

Dim objWord
'Dim objDoc
Set objWord = CreateObject("Word.Application")

'make Word visible
With objWord
   .Visible = True

   'Add method used to create a blank document
   .Documents.Add
   .Selection.InsertFile FileNameAndPath
End With
Fionnuala
  • 90,370
  • 7
  • 114
  • 152
1

The basic problem, which you hinted at, is that the String data type is limited to 65,400 characters. With an unknown file length, it is better to read in one line at a time and write it to Word. There is a good discussion of something similar here. The following code should help you get where you wan to go:

'read file and get text
dim objFile
set objFile=objFSO.OpenTextFile( strDMM & TheFile, ForReading)

'Don't do this!
'Dim strText
'strText=objFile.ReadAll

'Create a Word Object
Dim objWord
set objWord = CreateObject("Word.Application")

'make Word visible
With objWord
   .Visible = True
End With

'Add method used to create a blank document
Dim objDoc
Set objDoc=objWord.Documents.Add()

'create a shorter variable to pass commands to Word
Dim objSelection
set objSelection=objWord.Selection

'Read one line at a time from the text file and
'type that line into Word until the end of the file is reached
Dim strLine
Do Until objFile.AtEndOfStream
   strLine = objFile.ReadLine
   objSelection.TypeText strLine
Loop

objFile.Close

Hope that helps!

Community
  • 1
  • 1
  • That was exactly what I needed! Thank you! I just made a couple changes. After objSelection.TypeText strLine: objSelection.TypeParagraph (to preserve paragraph breaks). I also decided to make Word hidden until the end, so I changed the current Visible command to False, then added a new one to make it true as the last action in the script. That just leaves me with one more question. If variable type string is limited to 64,500 characters, and that limit is what was keeping the whole thing from copying, why does it work from one instance of Notepad to another? Thanks again! – tmoore82 Sep 05 '12 at 18:47