0

I have string representing a PDF that I'd like to save as a pdf file, my problem is that saving this string to a file results in a pdf with blank pages.

I've tried encoding the string as 'utf-8' and saving the bytes to a file but this results in the same issue.

import requests

url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
response = requests.get(url)

with open('example.pdf', 'w') as f:
  f.write(response.text)

I'm aware that saving response.content is the correct way to save the pdf in the example above, but in my particular use case I only have access to the string

bgordon
  • 149
  • 2
  • 15
  • Have a look at this thread! https://stackoverflow.com/questions/2252726/how-to-create-pdf-files-in-python – michal-ko Jul 12 '19 at 11:24
  • Why can't you use response.content? If you could you could open it using "wb" instead of "w" and write it and it would work perfectly for you. – Nat Cecil Jul 12 '19 at 11:27
  • Possible duplicate of [Download and save PDF file with Python requests module](https://stackoverflow.com/questions/34503412/download-and-save-pdf-file-with-python-requests-module) – Zaraki Kenpachi Jul 12 '19 at 11:35
  • As I said, I'm unable to access the content, I only have the text. My snipped above shows how to replicate the issue, but in my use case I'm getting the response text from elsewhere (not using the requests library) – bgordon Jul 14 '19 at 10:59

3 Answers3

0

You could try using fpdf library.

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
pdf.cell(200, 10, txt=response.text, ln=1, align="C")
pdf.output("output.pdf")

Reference: http://www.blog.pythonlibrary.org/2018/06/05/creating-pdfs-with-pyfpdf-and-python/

Docs: https://pyfpdf.readthedocs.io/en/latest/index.html

  • I just tried this and received this error: `UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 122: ordinal not in range(256)`. The text I'm trying to save looks like this (first 200 characters): `%PDF-1.4\n%äüöß\n2 0 obj\n<>\nstream\nx�=��\n\x021\x0cE����v\x11���0\x08\x0e�~��\x0f�\x00\x17���` i.e. it's the text of the actual PDF, not html – bgordon Jul 12 '19 at 14:02
0

From a link I posted before:

Use Pdfkit.

It creates pdf from html files. I chose it to create pdf in 2 steps from my Python Pyramid stack:

Rendering server-side with mako templates with the style and markup you want for you pdf document Executing pdfkit.from_string(...) method by passing the rendered html as parameter. This way you get a pdf document with styling and images supported.

You can install it as follows :

using pip

pip install pdfkit

You will also need to install wkhtmltopdf (on Ubuntu).

michal-ko
  • 397
  • 4
  • 12
0

Try it:

import pdftotext

# Load your PDF
with open(r'C:\Users\Mahsa\Desktop\stack\dummy.pdf', "rb") as f:
    pdf = pdftotext.PDF(f)

For saving text to pdf:

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 13.0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt="Your text from ", border=0)
pdf.output(r'D:\pdf\test.pdf', 'F')
Mahsa Hassankashi
  • 2,086
  • 1
  • 15
  • 25