0

I have a bunch of docx documents that I am extracting text out of using python-docx. Extraction works out fine. However, I am having trouble getting hold of numbering for the paragraphs.

A word document could be:

  1. Some Header: This is the first paragraph.
  2. Second Header: This is the second paragraph.

I am iterating thru the paragraph like this:

print(paragraph.text)
print(paragraph.style)
print(paragraph._p.pPr.numPr.numId.val)

It prints the paragraph style as

_ParagraphStyle('List Paragraph') id: 2280433126816

which is good. it also prints the text correctly.

However, it always prints 1 for

print(paragraph._p.pPr.numPr.numId.val)
Pankaj Singh
  • 526
  • 7
  • 21
  • If I did convert the paragraph to "Heading 1" style, then it works fine. – Pankaj Singh Sep 16 '19 at 18:04
  • If you stop at `print(paragraph._p.pPr.numPr)` do you get what you are looking for? – MyNameIsCaleb Sep 16 '19 at 19:37
  • It looks like what you are trying to do is answered [here](https://stackoverflow.com/questions/52094242/is-there-any-way-to-read-docx-file-include-auto-numbering-using-python-docx) – MyNameIsCaleb Sep 16 '19 at 19:38
  • @MyNameIsCaleb - This is what I get ' at 0x2319a195cc8> – Pankaj Singh Sep 16 '19 at 20:02
  • @MyNameIsCaleb - I tried your proposed solution too. Didn't work – Pankaj Singh Sep 16 '19 at 20:03
  • 1
    The real answer is that what you are trying to do is not really supported as of now. If you inspect the xml for the file, you can figure out what the styling is for numbered lists, and then if you inspect any paragraph you can determine which style is being used, so combining those will get you the answer you want. – MyNameIsCaleb Sep 16 '19 at 20:10
  • @MyNameIsCaleb - Yeah, I think you are right. Using xml, it might be easier to achieve – Pankaj Singh Sep 16 '19 at 20:13

0 Answers0