1

Preserve table format while reading and writing from existing docx to a new docx

Here is the code I am trying for the below table that is inside my demo.docx enter image description here

but I am not getting the output in same format Need help to fix this so that I can copy this table in the same format to my new docx
ITEM
NEEDED
Books
1
Pens
3
Pencils
2
Highlighter
2 colors
Scissors
1 pair

Code I am using is below..

  import docx
  doc = docx.Document('demo.docx')
  doc = docx.Document('demo.docx')
  for table in doc.tables:
    for row in table.rows:
       for cell in row.cells:
          for para in cell.paragraphs:
             print para.text

I was going through Parsing of table from .docx file but again , I need to create table inside new docx , not sure how to do that .

RonyA
  • 585
  • 3
  • 11
  • 26

1 Answers1

2

I think I have a clunky way of doing it whereby I convert the original docx table to a pandas DataFrame first, and then adding the dataframe back to a new document.

From what I gather, document files (*.docx, *.doc, *.txt) read as a string, so we have to treat the data as string. This means you will need to know the number of columns and rows of the table.

Assuming the original document file is called "Stationery.docx", this might do the trick.

import docx
import pandas as pd
import numpy as np

doc = docx.Document("Stationery.docx")

df = pd.DataFrame()

tables = doc.tables[0]

##Getting the original data from the document to a list
ls =[]
for row in tables.rows:
    for cell in row.cells:
        for paragraph in cell.paragraphs:
            ls.append(paragraph.text)



def Doctable(ls, row, column):
    df = pd.DataFrame(np.array(ls).reshape(row,column))  #reshape to the table shape
    new = docx.Document()
    word_table =new.add_table(rows = row, cols = column)
    for x in range(0,row,1):
        for y in range(0,column,1):
            cell = word_table.cell(x,y)
            cell.text = df.iloc[x,y]


    return new, df
Kah
  • 492
  • 1
  • 5
  • 17