0

I need to deal with tables in many word files. Some of them are created in word table format, which can be read using python-docx. A table created in word format

However, some of them are inserted from excel. I don't know why python-docx cannot read them. Here is piece of code I wrote for test. As you can see in the terminal, there is nothings in the list variable 'tables'.

enter image description here

import docx
from docx import Document
docFile = 'a.docx'
document = Document(docFile)
tables = document.tables
print(tables)

enter image description here Anyone can help? Thanks a lot!

FunPlus
  • 97
  • 3
  • Try: ``` Tables = document.tables[0] ``` [Related](https://stackoverflow.com/questions/27861732/parsing-of-table-from-docx-file) – DialFrost Aug 05 '22 at 06:14
  • I'm not familiar with python-docx, but I suspect it doesn't consider the embedded spreadsheets to be tables. They are stored inside the .docx zip archive as `\word\embeddings\*.xlsx`. If python-docx doesn't provide a way to read them, you can use [`zipfile`](https://stackoverflow.com/questions/73245329/python-docx-cannot-read-a-table-inserted-from-excel) and [`openpyxl`](https://openpyxl.readthedocs.io/) instead. – GordonAitchJay Aug 05 '22 at 06:18
  • Could I have your `docx` file? – AnhPC03 Aug 05 '22 at 08:00

1 Answers1

0

I'm fighting the same issue using Pages on OSX to create a .docx template. I've found that Format > Arrange > Object Placement needs to be set to Move with text for the table, changing it to have any alignment or formatting causes the tables to disappear in python and be read as paragraphs that contain nothing. Looking at the XML of both and the python-docx code I'm suspicious of w:tblInd but I'm not clued up enough to go much further. I see recent GitHub issues covering this so hopefully will get sorted.

example on OSX:

Javad
  • 2,033
  • 3
  • 13
  • 23