Python - Converting XLSX to PDF

Question

I have always used win32com module in my development server to easily convert from xlsx to pdf:

o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
o.DisplayAlerts = False
wb = o.Workbooks.Open("test.xlsx")))
wb.WorkSheets("sheet1").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, "test.pdf")
o.Quit()

However, I have deployed my Django app in production server where I don't have Excel application installed and it raises the following error:

File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\__init__.p
y", line 95, in Dispatch
    dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,c
lsctx)
  File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 114, in _GetGoodDispatchAndUserName
    return (_GetGoodDispatch(IDispatch, clsctx), userName)
  File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 91, in _GetGoodDispatch
    IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.II
D_IDispatch)
com_error: (-2147221005, 'Invalid class string', None, None)

Is there any good alternative to convert from xlsx to PDF in Python?

I have tested xtopdf with PDFWriter, but with this solution you need to read and iterate the range and write lines one by one. I wonder if there is a more direct solution similar to win32com.client.

Thanks!

Possible duplicate of [.xlsx and xls(Latest Versions) to pdf using python](https://stackoverflow.com/questions/20854840/xlsx-and-xlslatest-versions-to-pdf-using-python) — joppich, Sep 14 '18 at 07:27
It's not, the exception is different. By the way, that post has not been solved. — DavidRguez, Sep 14 '18 at 07:32
yeah it is... you have a portability issue, and that thread lists all viable options for converting xlsx to pdf. btw OP answered their own question. — joppich, Sep 14 '18 at 07:46
Yes, he asnwered with the same solution that didn't work for me. And as it's not the same problem that's why I opened a new thread. — DavidRguez, Sep 14 '18 at 08:04
Also very related to [convert excel to pdf in python](https://stackoverflow.com/q/2018932/5500092). — Thomas Devoogdt, May 02 '20 at 11:12

score 8 · Answer 1 · answered May 04 '20 at 07:25

8

As my original answer was deleted and is eventually a bit useful, I repost it here.

You could do it in 3 steps:

excel to pandas: pandas.read_excel
pandas to HTML: pandas.DataFrame.to_html
HTML to pdf: python-pdfkit (git), python-pdfkit (pypi.org)

import pandas as pd
import pdfkit

df = pd.read_excel("file.xlsx")
df.to_html("file.html")
pdfkit.from_file("file.html", "file.pdf")

install:

sudo pip3.6 install pandas xlrd pdfkit
sudo apt-get install wkhtmltopdf

answered May 04 '20 at 07:25

Thomas Devoogdt

816
11
16

1

This is very useful. Is it possible to do this in windows.? How can I install wkhtmltopdf in windows.? – S Andrew Sep 28 '20 at 09:15
2

Would this work on an excel file with images, formulas and formatting? – Psddp Feb 06 '21 at 18:53

score 3 · Answer 2 · edited Oct 11 '22 at 17:36

This is a far more efficient method than trying to load a redundant script that is hard to find and was wrtten in Python 2.7.

Load excel spread sheet into a DataFrame
Write the DataFrame to a HTML file
Convert the html file to an image.


    dirname, fname = os.path.split(source)
    basename = os.path.basename(fname)

    data = pd.read_excel(source).head(6)

    css = """

    """

    text_file = open(f"{basename}.html", "w")
    # write the CSS
    text_file.write(css)
    # write the HTML-ized Pandas DataFrame
    text_file.write(data.to_html())
    text_file.close()

    imgkitoptions = {"format": "jpg"}

    imgkit.from_file(f"{basename}.html", f'{basename}.png', options=imgkitoptions)

    try:
        os.remove(f'{basename}.html')
    except Exception as e:
        print(e)

    return send_from_directory('./', f'{basename}.png')

Taken from here https://medium.com/@andy.lane/convert-pandas-dataframes-to-images-using-imgkit-5da7e5108d55

Works really well, I have XLSX files converting on the fly and displaying as image thumbnails on my application.

If posting code in an answer you should include the imports. — osprey, Aug 10 '21 at 17:01

score 2 · Answer 3 · answered Sep 14 '18 at 07:33

2

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()

I have been using this and it works fine

answered Sep 14 '18 at 07:33

thirteen4054

465
4
17

1

Yeah, I have tested it and it works fine. The things is that you need to iterate the range to write lines in PDF. I was looking for a equivalent solution to `win32com` where you can choose the sheet you want to export. – DavidRguez Sep 14 '18 at 07:37
1

does the library PDFWriter still exist? – bakarin Sep 27 '18 at 07:44
Anybody knows how to install this PDFWriter? I am not able to find any links. Is pdfrw the same module? – Arindam Roychowdhury Apr 09 '19 at 12:55
@ArindamRoychowdhury https://bitbucket.org/vasudevram/xtopdf/downloads/ download the repo and its in there but i still cant get it going... – Ari Jun 11 '19 at 06:14
@ArindamRoychowdhury Nice, I ended up using Pandas, read_excel(), then converted the dataframe to HTML, then used imgkit to turn the html table into a PNG – Ari Jun 11 '19 at 12:32

score 0 · Answer 4 · answered Jul 04 '23 at 16:58

I'm using Linux, so I couldn't try pywin32. So I found unoserver with LibreOffice

import subprocess

def convert_xlsx_to_pdf(xlsx_file):
    try:
        subprocess.run(["libreoffice", "--headless", "--convert-to", "pdf", xlsx_file])
        print("Done!")

    except Exception as e:
        print("Error:", e)

convert_xlsx_to_pdf("file.xlsx")

Python - Converting XLSX to PDF

4 Answers4

Linked