Save response.text as PDF

Question

I have string representing a PDF that I'd like to save as a pdf file, my problem is that saving this string to a file results in a pdf with blank pages.

I've tried encoding the string as 'utf-8' and saving the bytes to a file but this results in the same issue.

import requests

url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
response = requests.get(url)

with open('example.pdf', 'w') as f:
  f.write(response.text)

I'm aware that saving response.content is the correct way to save the pdf in the example above, but in my particular use case I only have access to the string

Have a look at this thread! https://stackoverflow.com/questions/2252726/how-to-create-pdf-files-in-python — michal-ko, Jul 12 '19 at 11:24
Why can't you use response.content? If you could you could open it using "wb" instead of "w" and write it and it would work perfectly for you. — Nat Cecil, Jul 12 '19 at 11:27
Possible duplicate of [Download and save PDF file with Python requests module](https://stackoverflow.com/questions/34503412/download-and-save-pdf-file-with-python-requests-module) — Zaraki Kenpachi, Jul 12 '19 at 11:35
As I said, I'm unable to access the content, I only have the text. My snipped above shows how to replicate the issue, but in my use case I'm getting the response text from elsewhere (not using the requests library) — bgordon, Jul 14 '19 at 10:59

score 0 · Answer 1 · answered Jul 12 '19 at 11:25

0

You could try using fpdf library.

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
pdf.cell(200, 10, txt=response.text, ln=1, align="C")
pdf.output("output.pdf")

Reference: http://www.blog.pythonlibrary.org/2018/06/05/creating-pdfs-with-pyfpdf-and-python/

Docs: https://pyfpdf.readthedocs.io/en/latest/index.html

answered Jul 12 '19 at 11:25

Dainius Preimantas

706
5
14

I just tried this and received this error: `UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 122: ordinal not in range(256)`. The text I'm trying to save looks like this (first 200 characters): `%PDF-1.4\n%äüöß\n2 0 obj\n<>\nstream\nx�=��\n\x021\x0cE��v\x11��0\x08\x0e�~��\x0f�\x00\x17��` i.e. it's the text of the actual PDF, not html – bgordon Jul 12 '19 at 14:02

score 0 · Answer 2 · answered Jul 12 '19 at 11:31

From a link I posted before:

Use Pdfkit.

It creates pdf from html files. I chose it to create pdf in 2 steps from my Python Pyramid stack:

Rendering server-side with mako templates with the style and markup you want for you pdf document Executing pdfkit.from_string(...) method by passing the rendered html as parameter. This way you get a pdf document with styling and images supported.

You can install it as follows :

using pip

pip install pdfkit

You will also need to install wkhtmltopdf (on Ubuntu).

This won't work, the text I'm trying to save isn't html, it's the string of a pdf — bgordon, Jul 12 '19 at 14:00

Mahsa Hassankashi · Answer 3 · 2019-07-12T14:18:43.140

0

Try it:

import pdftotext

# Load your PDF
with open(r'C:\Users\Mahsa\Desktop\stack\dummy.pdf', "rb") as f:
    pdf = pdftotext.PDF(f)

For saving text to pdf:

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 13.0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt="Your text from ", border=0)
pdf.output(r'D:\pdf\test.pdf', 'F')

edited Jul 12 '19 at 14:18

answered Jul 12 '19 at 11:59

Mahsa Hassankashi

2,086
1
15
25

This loads the pdf as text, I'm trying to save a text representation of a PDF to a file – bgordon Jul 12 '19 at 13:59
@bgordon I updated my answer, you can save text to pdf. – Mahsa Hassankashi Jul 12 '19 at 14:19

Save response.text as PDF

3 Answers3