Get data frame in shape of table in word document

Question

I am reading an excel file, extracting a specific df and putting it in word document. The issues I face are:

DF loses its shape once added to para. becomes totally useless.

Complete code is written below.

#importing required libraries
import pandas as pd
import numpy as np
eod = pd.read_excel('df.xlsx')
import datetime
import docx 
from datetime import date
legal = docx.Document('legal.docx')

#Calculating No. days from SCN
eod['SCN Days'] = (pd.Timestamp('now').floor('d') - eod['SCN Date']).dt.days

#Generation list of EFE for Final Showcause Notice to be issued today
FSCN_today = eod.where(eod['SCN Days']>20)
#Dropping Null from generated list
FSCN_today = FSCN_today.dropna(how ="all")
FSCN_today = FSCN_today[['Exporter Name','EFE','DESTINATION','VALUE']]

#Getting Unique Values in the list generated
s_values = FSCN_today['Exporter Name'].unique()

#Iterating through List
for c in s_values:
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1]
    legal.paragraphs[15].text = str(df1)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

#Update Date & Status of FSCN Issued today
eod['FSCN Date'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),date.today(),eod['FSCN Date'])
eod['Status'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),"FSCN ISSUED",eod['Status'])

#In progress
name = "EOD "+ str(date.today())+ ".xlsx"
#eod.to_excel(name,index =False)

Following line have error.

legal.paragraphs[15].text = str(df1)

Is it possible to share the sample files you are using in the code. You can replace it with dummy data. The primary reason is to understand the types. Also check if there is a possibility that you do not have 15 paragraphs. — SaaSy Monster, Sep 23 '20 at 08:01
Share the error as well along with the dummy data, the question needs to be more descriptive — Ajinkya, Sep 23 '20 at 13:47
You can check the dummy data at https://github.com/iqbalhusnain/Export-Overdue — Husnain Iqbal, Sep 23 '20 at 14:08
I don't get an error with the code from your GitHub repository. The df-table is in the documents (it doesn't look good, but it's there)? Your code here works too, except for the `paragraphs[8]`-statement that S Mayer pointed out? — Timus, Sep 27 '20 at 13:41

score 0 · Answer 1 · answered Sep 26 '20 at 20:02

0

I noticed that legal.paragraphs[8].text = df1.iloc[10:1] looked odd.

If you change it to legal.paragraphs[8].text = df1[10:1].iloc the resulting .docx files look more reasonable to me.

I do not know what your desired output is, so that's my best guess with what I'm presented.

answered Sep 26 '20 at 20:02

S Mayer

82
6

Paragraph 8 is already good. The issue is with para 15. I want to display the df as a table there. I don't know how to do it. – Husnain Iqbal Sep 27 '20 at 09:10

score 0 · Answer 2 · answered Sep 27 '20 at 15:57

I've never worked with python-docx so I'm pretty sure my attempt is suboptimal. The following did work with the sample data.

Essentially, I've added a table to the document and inserted the column labels and content of the DataFrame into the table. There are some nasty parts that I couldn't solve otherwise (the parts where I access the _-attributes of paragraph and table).

I replaced the following part of your code above

#Iterating through List
for c in s_values:
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1]
    legal.paragraphs[15].text = str(df1)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

with this (comments for highlighting what I did, line breaks for better readability):

for c in s_values:
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1[10:1].iloc # <- Changed

    # Add a table with the same amount of columns as the DataFrame
    table = legal.add_table(0, len(df1.columns))
    table.autofit = True

    # Create the header line (= column labels of the DataFrame)
    header = table.add_row()
    for col, cell in enumerate(header.cells):
        cell.text = str(df1.columns[col])

    # Insert the content of DataFrame in the table
    for ind in df1.index:
        row = table.add_row()
        for pos, col in enumerate(df1.columns):
            row.cells[pos].text = df1.loc[ind, col]

    # Add a break in paragraph 15 (before the table)
    legal.paragraphs[15].add_run().add_break()

    # Add the table to paragraph 15
    legal.paragraphs[15]._p.addnext(table._tbl)

    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

    # Remove the table
    table._element.getparent().remove(table._element)

DDD1 · Accepted Answer · 2020-09-28T12:57:39.003

You can make this work by creating a table, transfering the dataframe into that table (as explained in this post) and then placing that table right where legal.paragraphs[15] is at:

#importing required libraries
import pandas as pd
import numpy as np
eod = pd.read_excel('df.xlsx')
import datetime
import docx 
from datetime import date

#Calculating No. days from SCN
eod['SCN Days'] = (pd.Timestamp('now').floor('d') - eod['SCN Date']).dt.days

#Generation list of EFE for Final Showcause Notice to be issued today
FSCN_today = eod.where(eod['SCN Days']>20)
#Dropping Null from generated list
FSCN_today = FSCN_today.dropna(how ="all")
FSCN_today = FSCN_today[['Exporter Name','EFE','DESTINATION','VALUE']]

#Getting Unique Values in the list generated
s_values = FSCN_today['Exporter Name'].unique()

#Iterating through List
for c in s_values:
    legal = docx.Document('legal.docx')
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1].iloc
    legal.paragraphs[15].text = ""
    t = legal.add_table(df1.shape[0]+1, df1.shape[1])
    for j in range(df1.shape[-1]):
        t.cell(0,j).text = df1.columns[j]
    for i in range(df1.shape[0]):
        for j in range(df1.shape[-1]):
            t.cell(i+1,j).text = str(df1.values[i,j])    
    legal.paragraphs[15]._p.addnext(t._tbl)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

#Update Date & Status of FSCN Issued today
eod['FSCN Date'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),date.today(),eod['FSCN Date'])
eod['Status'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),"FSCN ISSUED",eod['Status'])

#In progress
name = "EOD "+ str(date.today())+ ".xlsx"
#eod.to_excel(name,index =False)

(I moved the legal = docx.Document('legal.docx') to the loop as consecutive docx's were keeping the older exporter values)

Get data frame in shape of table in word document

3 Answers3