Convert PDF to Binary in Python (Django)

Question

I need to deliver a PDF to the browser and it's being returned from an API in binary.

I'm using python 2.7, Django 1.5, and requests

I followed the recommendation in the Django docs and installed ReportLab. I also got the following example working well:

response = HttpResponse(content_type="application/pdf")
response["Content-Disposition"] = "inline; filename=a_test_document.pdf"

p = canvas.Canvas(response)

p.drawString(100, 500, "Hello world")

p.showPage()
p.save()

return response

However, that just allows me to draw on my own PDF. Is there any way for me to to convert binary to PDF? I've looked through the reportlab docs as well as some other solutions but didn't see anything definitive.

How do you mean, "convert binary"? What do you have which is binary? — cwallenpoole, May 29 '14 at 01:45
@cwallenpoole I'm getting binary back from an API with a mime type of application/pdf — diplosaurus, May 29 '14 at 03:59
"Binary" is not a file format. It is not at all helpful to say you are receiving this file "in binary". What are the implications for you for how you are receiving this file, and what is wrong? — Daniel Roseman, May 29 '14 at 07:14
@diplosaurus It sounds like what you're trying to do is modify an existing PDF. If that's the case, you might want to look at this answer. http://stackoverflow.com/a/2180841/57191 — cwallenpoole, May 29 '14 at 12:57

olzhas · Answer 1 · 2014-05-29T04:31:25.870

For generating PDF, you may use xhtml2pdf library.

The function returns response object, you just pass the name of your template, context data and pdfname.

def fetch_resources(uri, rel):
    """
    Callback to allow xhtml2pdf/reportlab to retrieve Images,Stylesheets, etc.
    `uri` is the href attribute from the html link element.
    `rel` gives a relative path, but it's not used here.

    """
    if uri.startswith(settings.MEDIA_URL):
        path = os.path.join(settings.MEDIA_ROOT,
                            uri.replace(settings.MEDIA_URL, ""))
    elif uri.startswith(settings.STATIC_URL):
        path = os.path.join(settings.STATIC_ROOT,
                            uri.replace(settings.STATIC_URL, ""))
    else:
        path = os.path.join(settings.STATIC_ROOT,
                            uri.replace(settings.STATIC_URL, ""))

        if not os.path.isfile(path):
            path = os.path.join(settings.MEDIA_ROOT,
                                uri.replace(settings.MEDIA_URL, ""))

            if not os.path.isfile(path):
                raise UnsupportedMediaPathException(
                                    'media urls must start with %s or %s' % (
                                    settings.MEDIA_ROOT, settings.STATIC_ROOT))

    return path

def render_to_pdf_response(template_name, context=None, pdfname='test.pdf'):
  file_object = HttpResponse(mimetype='application/pdf')
  file_object['Content-Disposition'] = 'attachment; filename=%s' % pdfname
  template = get_template(template_name)
  html = template.render(Context(context))
  pisa.CreatePDF(html.encode("UTF-8"), file_object , encoding='UTF-8',
                 link_callback=fetch_resources)
  return file_object

Here is installation instructions: https://pypi.python.org/pypi/xhtml2pdf/

score 0 · Answer 2 · edited May 23 '17 at 10:32

It looks like you're trying to update an existing PDF instead of simply creating a new one. In which case this answer is probably what you're looking for. To summarize his solution:

read your PDF using PdfFileReader(), we'll call this input

create a new pdf containing your text to add using ReportLab, save this as a string object

read the string object using PdfFileReader(), we'll call this text

create a new PDF object using PdfFileWriter(), we'll call this output

iterate through input and apply .mergePage(text.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document

On the other hand, if you're uncertain of the file type of the received binary (not likely by your example, but worth mentioning), you can use something called python-magic. This is an untested potential example:

In [2]: import magic
In [3]: m = magic.Magic(mime=True)
In [4]: m.from_file('/home/culebron/Documents/chapter2.pdf')
Out[4]: 'pdf'

Based on that final output you could determine:

whether it is a PDF
If so, how to apply your desired changes or merge with the current PDF doc.
If not, how to write the contents to the Canvas.

Convert PDF to Binary in Python (Django)

2 Answers2