13

I need to write a python script to read excel files, find each worksheet and then print these to pdf with the standard formating defined in the excel.

I found the following question How can I open an Excel file in Python? which pointed me to http://www.python-excel.org/

This gives me the ability to find the names of each worksheet.

import xlrd
book = xlrd.open_workbook("myfile.xls")
print "Worksheet name(s):", book.sheet_names()

This results in

Worksheet name(s): [u'Form 5', u'Form 3', u'988172 Adams Road', u'379562 Adams Road', u'32380 Adams Road', u'676422 Alderman Road', u'819631 Appleyard Road', u'280998 Appleyard Road', u'781656 Atkinson Road', u'949461 Barretts Lagoon Road', u'735284 Bilyana Road', u'674784 Bilyana Road', u'490894 Blackman Road', u'721026 Blackman Road']

Now I want to print each worksheet which starts with a number to a pdf.

So I can

worksheetList=book.sheet_names()
for worksheet in worksheetList:
 if worksheet.find('Form')!=0: #this just leaves out worksheets with the word 'form' in it
  <function to print to pdf> book.sheet_by_name(worksheet) #what can I use for this?

or something similar to above...what can I use to achieve this?

The XLRD documentation is confusing it says

Formatting features not included in xlrd version 0.6.1: Miscellaneous sheet-level and book-level items e.g. printing layout, screen panes

and

Formatting

Introduction

This collection of features, new in xlrd version 0.6.1, is intended to provide the information needed to (1) display/render spreadsheet contents (say) on a screen or in a PDF file

see https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966

Which is true? can some other package be used to print to pdf?

For unix I see that there is http://dag.wieers.com/home-made/unoconv/ anything for windows? I found https://gist.github.com/mprihoda/2891437 but can't figure out how to use it yet.

Community
  • 1
  • 1
GeorgeC
  • 956
  • 5
  • 16
  • 40
  • 1
    Hey @GeorgeC, have you found a solution for this? I'm looking for a way to "print" a whole xsl to pdf, so if you submit your solution as an anwer will be helpfull :) – Gustavo Vargas Oct 22 '13 at 19:41
  • @GustavoVargas I didn't use the following as it does not retain formating but xtopdf seemed to be an ok solution - dancingbison.com/products.html and the developer was very helpful as well. – GeorgeC Oct 28 '13 at 02:32

2 Answers2

34

This seems like the place to put this answer.

In the simplest form:

import win32com.client

o = win32com.client.Dispatch("Excel.Application")

o.Visible = False

wb_path = r'c:\user\desktop\sample.xls'

wb = o.Workbooks.Open(wb_path)



ws_index_list = [1,4,5] #say you want to print these sheets

path_to_pdf = r'C:\user\desktop\sample.pdf'



wb.WorkSheets(ws_index_list).Select()

wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)

Including a little formatting magic that scales to fit to a single page and sets the print area:

import win32com.client

o = win32com.client.Dispatch("Excel.Application")

o.Visible = False

wb_path = r'c:\user\desktop\sample.xls'

wb = o.Workbooks.Open(wb_path)



ws_index_list = [1,4,5] #say you want to print these sheets

path_to_pdf = r'C:\user\desktop\sample.pdf'

print_area = 'A1:G50'



for index in ws_index_list:

    #off-by-one so the user can start numbering the worksheets at 1

    ws = wb.Worksheets[index - 1]

    ws.PageSetup.Zoom = False

    ws.PageSetup.FitToPagesTall = 1

    ws.PageSetup.FitToPagesWide = 1

    ws.PageSetup.PrintArea = print_area



wb.WorkSheets(ws_index_list).Select()

wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)

I have also started a module over a github if you want to look at that: https://github.com/spottedzebra/excel/blob/master/excel_to_pdf.py

milowh74
  • 478
  • 5
  • 6
  • 1
    Just adding all of the other PageSetup options: https://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.pagesetup_properties.aspx – Kristen G. Aug 18 '16 at 16:31
  • If the tab has multiple print area, how to define that? – Gavin Sep 26 '17 at 02:55
  • I am not sure but I have a suggestion on how to proceed. I think you could add multiple "print_area" variables. So you could do the following: print_area = A1:G50 print_area2 = A52:G52 You may also have to loop through the sheet index twice or do the page setup bit of code twice in the for loop. These are just ideas. I have not actually tried to print multiple print areas. – milowh74 Sep 28 '17 at 04:06
  • 5
    Hi, since you are using the library: import win32com.client, it seems this is a windows only solution. Can this work on a Linux server that does not have Microsoft office installed. – Palu Jul 26 '18 at 14:55
  • It seems like this is a solution for Excel files stored on disk. That would be a great edit to the answer if someone explained how this could be done with Excel instances so that one would not have to save it on disk before converting to PDF. – precise Nov 16 '18 at 08:23
  • This code works great to save PDF files. However, an instance of excel keeps floating about in the Task-Manager active list. Trying `wb.Close()` just seems to make Python hang and not close anything, `o.Quit()` tries to close Excel, but hangs on the "Do you want to save the file" dialogue. How can I close Excel after opening it? And, separate issue, it opens 32-bits Excel (obviously, given you use `win32com`), is there a 64-bit instance? – Adriaan Jan 04 '22 at 15:44
  • @milowh74 I am trying to export the sheets in a specific order like that `[3, 1, 2]` but it is exported in order 1,2,3. Is it possible to make it limited to the same order of the list? – YasserKhalil Mar 29 '22 at 04:53
  • 1
    @Palu, 4.5 years later, it looks like there is still not simple way to print xlsx with nice pie charts to a pdf on Linux using Python. – Kashyap Jan 12 '23 at 17:48
  • @Kashyap Is there still no option, have you found something? I'm stuckin the same step, I want to convert a xlsx to png, used to do it with win32com but now I have to switch to Linux. – ColinDave Aug 09 '23 at 07:22
  • @ColinDave, nope :-| – Kashyap Aug 09 '23 at 13:42
0

Yes Kashyap, I concluded that in python, it is not possible basically. SO I went into the realm of JVM languages instead.

Palu
  • 668
  • 3
  • 11
  • 26