0

I have a Word docx file and I want to retrieve all the paragraphs in OpenXml with c#. I need to know: 1.-The number of pages of the Documents. 2.-The page number to which each paragraph belongs.

Can you show an example where the paragraphs of the document are read?

zequion
  • 1
  • 2

1 Answers1

0

Unfortunately, As Why only some page numbers stored in XML of docx file? answers, docx dose not contains reliable page number service. Xml files carry no page number, until microsoft Word open it and render dynamically. Even you read openxml documents like https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.pagenumber?view=openxml-2.8.1 .

You can unzip some docx files, and search "page" or "pg". Then you will know it. I do this on different kinds of docx files in my situation. All tell me the same truth. Glad if this helps.


Few month ago, I reprogramed a python package call docx2python to do similar thing. I reproduced a structured(with level) xml format file from a docx file. As far as I know, a paragraph contains several Runs and each Run contain one only text. You can read this document to see how to do it. Plain paragraphes are not hard to read. https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.paragraph?view=openxml-2.8.1 . Glad if this helps.

Szymon
  • 19
  • 4