0

My goal is to capture all text from a large collection of Word documents, by document and by paragraph within each document, and save it in txt format.

I understand that I will have to capture:

(1) header and footer text using code like:

doc.Sections(1).Footers(1).Range.Text

(2) document body text (I intend to do it by paragraph):

For Each p In doc.Paragraphs
    txt = p.Range
Next

(3) text in various textboxes and shapes:

Shapes(1).TextFrame.TextRange.Text

Are there any other Word objects which may contain text? Is there a better way to achieve my goal? I considered saving Word document as text but the text contained in text boxes was lost in this process.

UPD. There are also Footnotes and Endnotes to be taken care of. Anything else?

HTC User
  • 13
  • 3
  • Does this answer your question? [How to extract just plain text from .doc & .docx files?](https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files) – HackSlash Nov 05 '19 at 23:00
  • `doc.Content.Text` – HackSlash Nov 05 '19 at 23:01
  • @HackSlash This is very promising, though it will take me some time to actually try it as I work in Windows environment. I mean the advice with the Unix command. – HTC User Nov 05 '19 at 23:15

0 Answers0