15

I have been looking to convert a .pptx file to a .pdf file through a Python script for several hours but nothing seems to be working.

What I have tried: I have tried 1) this script which calls windows32.client, and 2) unoconv, but none of them seem to be working for me.

Problems encountered: Using script from first option throws up an error (com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147024894), None)), whereas in second option Python can't seem to recognize unoconv even after installing it using pip.

I also saw some recommended Pandoc, but I can't understand how to use it for Python.

Versions I am using: Python 2.7.9, Windows 8.1

majom
  • 7,863
  • 7
  • 55
  • 88
  • I am wondering if it wouldn't be easier to write it in VBA as you could use the the export to PDF object. You would just need to set up the framework for opening and closing the files in a directory and then have it run through the export process. – AMR Jul 18 '15 at 04:52
  • @AMR: I have never used VBA, so I didn't think of doing something like that. If you don't mind could you give an example of doing that? If I have the VBA file then I can open that file from python as you suggested. Thanks for your comment. –  Jul 18 '15 at 05:08
  • 1
    I haven't coded in VBA in several years. I was trying to look through some old code that I had, but I can't find the work that I did accessing the filesystem. – AMR Jul 18 '15 at 05:35
  • 1
    Try reasking this question on stack exchange Super User and reframe it as a VBA question. I see more VBA questions over there. – AMR Jul 18 '15 at 05:37
  • 1
    Thanks for your suggestions –  Jul 18 '15 at 05:38
  • 1
    Also try this post. There are a lot of similarities to writing Python code to VBA. You only need to learn a few of the Objects in the Object Model and that shouldn't be more than a few hours if you are already advanced enough to be tackling challenges like this. http://stackoverflow.com/questions/25526335/vba-object-model-reference-documentation – AMR Jul 18 '15 at 05:40
  • 1
    @AMR: I solved it with the help of `comtypes` and [this post](http://stackoverflow.com/questions/6011115/doc-to-pdf-using-python). –  Jul 18 '15 at 06:37
  • 1
    You should write up the answer to this question. Glad you found a solution! – AMR Jul 18 '15 at 06:59
  • 1
    This article can answer your question as it shows how to do the same for Word files: http://stackoverflow.com/questions/6011115/doc-to-pdf-using-python – mstuebner Jul 19 '15 at 20:31
  • @AMR: The answer is posted as promised. –  Jul 25 '15 at 07:18

8 Answers8

27

I found the answer with the help of this post and the answer from this question.

Note that comtypes is only available for Windows. Other platforms will not support this.

import comtypes.client

def PPTtoPDF(inputFileName, outputFileName, formatType = 32):
    powerpoint = comtypes.client.CreateObject("Powerpoint.Application")
    powerpoint.Visible = 1

    if outputFileName[-3:] != 'pdf':
        outputFileName = outputFileName + ".pdf"
    deck = powerpoint.Presentations.Open(inputFileName)
    deck.SaveAs(outputFileName, formatType) # formatType = 32 for ppt to pdf
    deck.Close()
    powerpoint.Quit()
Community
  • 1
  • 1
  • 2
    Where does the number 32 come from? Is there a list of formats available somewhere? – Oskar Persson Sep 10 '18 at 16:42
  • @OskarPersson The number comes from the PpSaveAsFileType enumeration, the complete list is here: https://learn.microsoft.com/en-us/office/vba/api/powerpoint.ppsaveasfiletype – kibibu Feb 25 '19 at 03:06
6

I was working with this solution but I needed to search all .pptx, .ppt, and then turn them all to .pdf (python 3.7.5). Hope it works...

import os
import win32com.client

ppttoPDF = 32

for root, dirs, files in os.walk(r'your directory here'):
    for f in files:

        if f.endswith(".pptx"):
            try:
                print(f)
                in_file=os.path.join(root,f)
                powerpoint = win32com.client.Dispatch("Powerpoint.Application")
                deck = powerpoint.Presentations.Open(in_file)
                deck.SaveAs(os.path.join(root,f[:-5]), ppttoPDF) # formatType = 32 for ppt to pdf
                deck.Close()
                powerpoint.Quit()
                print('done')
                os.remove(os.path.join(root,f))
                pass
            except:
                print('could not open')
                # os.remove(os.path.join(root,f))
        elif f.endswith(".ppt"):
            try:
                print(f)
                in_file=os.path.join(root,f)
                powerpoint = win32com.client.Dispatch("Powerpoint.Application")
                deck = powerpoint.Presentations.Open(in_file)
                deck.SaveAs(os.path.join(root,f[:-4]), ppttoPDF) # formatType = 32 for ppt to pdf
                deck.Close()
                powerpoint.Quit()
                print('done')
                os.remove(os.path.join(root,f))
                pass
            except:
                print('could not open')
                # os.remove(os.path.join(root,f))
        else:
            pass

The try and except was for those documents I couldn't read and won't exit the code until the last document. And I would recommend doing each type of format aside: first .pptx and then .ppt (or viceversa).

  • This works, however this approach creates problems if there is a dot (.) in the file name (file_v_1.3.pptx). The work around is to rename the file first and than rename it again in the end. Is there a better way of doing this? – valenzio Oct 28 '20 at 07:17
3

I believe the answer has to be updated because because comtypes doesn't work anymore.

So this is the code which works (updated version of the accepted answer) :

import win32com.client

def PPTtoPDF(inputFileName, outputFileName, formatType = 32):
    powerpoint = win32com.client.DispatchEx("Powerpoint.Application")
    powerpoint.Visible = 1

    if outputFileName[-3:] != 'pdf':
        outputFileName = outputFileName + ".pdf"
    deck = powerpoint.Presentations.Open(inputFileName)
    deck.SaveAs(outputFileName, formatType) # formatType = 32 for ppt to pdf
    deck.Close()
    powerpoint.Quit()
Sankar
  • 546
  • 4
  • 15
2

Have a look at the following snippet. It uses unoconv and it's working ex expected on UBUNTU 20.04.

# requirements
# sudo apt install unoconv
# pip install tqdm
# pip install glob
import glob
import tqdm
path = "<INPUT FOLDER>"
extension = "pptx"
files = [f for f in glob.glob(path + "/**/*.{}".format(extension), recursive=True)]
for f in tqdm.tqdm(files):
    command = "unoconv -f pdf \"{}\"".format(f)
    os.system(command)

This snippet can be used for different-2 format conversion.

Original Snippet

vikram meena
  • 303
  • 1
  • 2
  • 8
  • I dont seem to get output when running this snippet. Where should I be able to find the created pdf? – Ger Sep 05 '21 at 22:16
1

I need a way to save PPTX file to PDF and PDF with notes. Here is my solution

from comtypes.client import CreateObject, Constants

def PPTtoPDF(inputFileName, outputFileName, formatType = 32):
    powerpoint = CreateObject('Powerpoint.Application')
    constants = Constants(powerpoint)
    powerpoint.Visible = 1

    if outputFileName[-3:] != 'pdf':
        outputFileName = outputFileName + ".pdf"
    deck = powerpoint.Presentations.Open(inputFileName)
    deck.SaveAs(outputFileName, constants.PpSaveAsPDF)
    deck.Close()
    powerpoint.Quit()


def PPTtoPDFNote(inputFileName, outputFileName, formatType = 32):
    powerpoint = CreateObject('Powerpoint.Application')
    constants = Constants(powerpoint)
    powerpoint.Visible = 1

    if outputFileName[-3:] != 'pdf':
        outputFileName = outputFileName + ".pdf"
    deck = powerpoint.Presentations.Open(inputFileName)
    deck.ExportAsFixedFormat(
        outputFileName,
        constants.ppFixedFormatTypePDF,
        constants.ppFixedFormatIntentPrint,
        False, # No frame
        constants.ppPrintHandoutHorizontalFirst,
        constants.ppPrintOutputNotesPages,
        constants.ppPrintAll
    )
    deck.Close()
    powerpoint.Quit()

To use it,

PPTtoPDF    ('.\\Test.pptx', '.\Test.pdf'          )
PPTtoPDFNote('.\\Test.pptx', '.\Test_with_Note.pdf')

Note: It is always the best to do it using Windows platform, i.e., using comtypes so that it could always support new format and features in Microsoft Powerpoint.

yoonghm
  • 4,198
  • 1
  • 32
  • 48
1

try this code it works with me

import os
import win32com.client as win32
import comtypes

#make sure to initial cometypes
comtypes.CoInitialize()


# Path to input PowerPoint document
input_path = 'path/to/input/document.pptx'

# Path to output PDF file
output_path = 'path/to/output/document.pdf'

# Open PowerPoint document and convert to PDF
powerpoint = win32.Dispatch('Powerpoint.Application')
presentation = powerpoint.Presentations.Open(input_path)
presentation.SaveAs(output_path , 32)
presentation.Close()
powerpoint.Quit()
0

unoconv is a great tool to perform this task and it is indeed build in python. Regarding your problem, it might be related to a recurring problem with the way the python interpreter is set in the main unoconv file after it has been installed.

To run it with python3 interpreter, replace #!/usr/bin/env python with #!/usr/bin/env python3 or #!/usr/bin/python3 in unoconv file (/usr/bin/unoconv).

one liner:

sudo sed -i -e '1s:#!/usr/bin/env python$:#!/usr/bin/env python3:' /usr/bin/unoconv

You could also symlink /usr/bin/unoconv to /usr/local/bin/unoconv.

Jean-Christophe Meillaud
  • 1,961
  • 1
  • 21
  • 27
0

For converting .pptx/.docx to pdf on google cloud function, I referred to this github repo https://github.com/zdenulo/gcp-docx2pdf/tree/master/cloud_function, they are using google drive api's. In this repo they have used mime type of docx to convert .docx file to .pdf file over google drive, you can use other mime types as well, like mime type of pptx(referring: https://developers.google.com/drive/api/v3/mime-types) to convert files over google drive. Rest all code is same as mentioned in the github repo.