5

Is there a way to get efficiently the number of pages of a word document (.doc, .docx) with Python ?

And for an .odt file ?

I want to use this for a web application based on Web2py on Linux.

Thank you !

Xavier R.
  • 127
  • 1
  • 1
  • 7
  • 2
    For docx, there is a python module [`docx`](https://github.com/mikemaccana/python-docx) that gives you access to the XML of the Word document. This may or may not have the number of pages. – Sam Mussmann Oct 18 '12 at 22:22

2 Answers2

7

Only for those who search for this blog entry....

from win32com.client import Dispatch
#open Word
word = Dispatch('Word.Application')
word.Visible = False
word = word.Documents.Open(doc_path)

#get number of sheets
word.Repaginate()
num_of_sheets = word.ComputeStatistics(2)
a_guest
  • 71
  • 1
  • 2
  • Excellent answer using the pywin32 package. Worked perfectly well for me. Thank you. – stratis Aug 06 '14 at 15:52
  • I tried the win32com solution and I got this error: `'' object has no attribute 'Repaginate'` Did `Repaginate` get deprecated? – Jed May 17 '17 at 19:41
  • 2
    Drawback of this solution is that it will only work on Windows that has MS-Word installed. The answer where the pages are read from the file is not depending on the operating system (i.e. works also on Linux) is faster and is not depending on MS-Word being present – Dick Kniep Nov 25 '19 at 14:17
  • Great answer! One attachment from my side is word.Close(True) at the end – Nil Nov 26 '20 at 10:02
4

You can read the value

<Properties>
<Pages>CountValue</Pages>

from docProps/app.xml in the docx package or

<office:document-meta>
    <office:meta>
        <meta:document-statistic meta:page-count="CountValue">

form meta.xml in odt package.

If these values ​​do not exist (they are optional), you have to make a calculation of the entire document, in fact perform rendering, that much more difficult

pogorskiy
  • 4,705
  • 1
  • 22
  • 21