How to "write to variable" instead of "to file" in Python

Question

I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            output.write(outputStream)
    return pages

This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream) to:

pages.append(outputStream)

When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file.

Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!

Have you tried reading the data, rather than storing the file handle - `pages.append(outputStream.read())`? — jonrsharpe, Oct 23 '14 at 13:32
Have you tried using `cStringIO.StringIO` to open `outputStream`? — user4815162342, Oct 23 '14 at 13:37
what the user above said... you can usually substitute a `StringIO` object for a file and get the result out as a string that way — Anentropic, Oct 23 '14 at 13:40
@jonrsharpe - I just tried it, and that gives me a `IOError: File not open for reading` on the line saying `pages.append(outputStream.read())`. Any other ideas? — kramer65, Oct 23 '14 at 13:40
@user4815162342 - Ehm, no I haven't tried StringIO. Any tips on how to do that? A code example would be very welcome.. :) — kramer65, Oct 23 '14 at 13:41
What is the use case. You want to have a list of file handles to operate on after you called splitPdf? Can`t you just have a list of path instead? — Rod, Oct 23 '14 at 14:04

user4815162342 · Accepted Answer · 2014-10-23T18:53:40.990

6

It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open() with StringIO and call getvalue() to obtain the contents:

import cStringIO

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        io = cStringIO.StringIO()
        output.write(io)
        pages.append(io.getvalue())
    return pages

edited Oct 23 '14 at 18:53

answered Oct 23 '14 at 14:36

user4815162342

141,790
18
296
355

(This answer is Python 2 only) – Garrett Dec 08 '19 at 02:55
@Garrett It should be quite straightforward to adapt to Python 3, though. – user4815162342 Dec 08 '19 at 07:42

score 5 · Answer 2 · answered Oct 23 '14 at 14:07

You can use the in-memory binary streams in the io module. This will store the pdf files in your memory.

import io

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        outputStream = io.BytesIO()

        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        output.write(outputStream)

        # Move the stream position to the beginning,
        # making it easier for other code to read
        outputStream.seek(0)

        pages.append(outputStream)
    return pages

To later write the objects to a file, use shutil.copyfileobj:

import shutil

with open('page0.pdf', 'wb') as out:
    shutil.copyfileobj(pages[0], out)

score 1 · Answer 3 · answered Oct 23 '14 at 14:24

Haven't used PdfFileWriter, but think that this should work.

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        pages.append(output)
    return pages

def writePdf(pages):
    i = 1
    for p in pages:
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            p.write(outputStream)
        i += 1

How to "write to variable" instead of "to file" in Python

3 Answers3

Linked