5

I'm struggling to generate just a simple PDF with non-ascii characters using Python 3.5.2, python-pdfkit and wkhtmltox-0.12.2.

This is the easiest example I could write:

import pdfkit
html_content = u'<p>ö</p>'
pdfkit.from_string(html_content, 'out.pdf')

This is like the output document looks like: non-ascii character incorrectly shown in the PDF

jllopezpino
  • 868
  • 2
  • 9
  • 17

3 Answers3

33

I found out that I just needed to add a meta tag with charset attribute to my HTML code:

import pdfkit

html_content = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
</head>
<body>
    <p>&euro;</p>
    <p>áéíóúñö</p>
<body>
</html>
"""

pdfkit.from_string(html_content, 'out.pdf')

I actually spent quite some time following wrong solutions like the one suggested here. In case someone is interested, I wrote a short story on my blog. Sorry for the SPAM :)

jllopezpino
  • 868
  • 2
  • 9
  • 17
  • I'm glad that it helped @VishnuYS ! – jllopezpino Sep 28 '17 at 14:50
  • 1
    This only works if you are using pdfkit.from_string(), I was using pdfkit.from_file() and didn't work. – Cristóbal Felipe Fica Urzúa Aug 10 '18 at 18:31
  • 1
    I was trying to extract the html from an email and generate a PDF from it and even though I had a proper unicode string with accents as the "input" for pdfkit.from_string, the ouput had messed up encoding. Manually adding the meta charset attribute to the html code (in the unicode string) works like a charm and pdfkit generates a proper PDF with accents and other non ascii characters. Most of the times the is empty so I do a simple replace, otherwise I insert it after (if it's absent) or append to it (if it's present and not empty). Thanks @jllopezpino – Jeb Dec 11 '18 at 16:40
  • 1
    Works perfect! Thanks! (I am using from_string) – Marcus Oct 22 '19 at 21:46
  • Please, how to make it work with `pdfkit.from_file()`? – GitHunter0 Aug 18 '21 at 03:08
1

There is a relevant issue in pdfkit project https://github.com/devongovett/pdfkit/issues/470 that says

"You need to use an embedded font. The built-in fonts have a limited character set available."

An answer to this question How to: output Euro symbol in pdfkit for nodejs gives a clue how to do it.

piokuc
  • 25,594
  • 11
  • 72
  • 102
  • 1
    I realised that there are 3 projects called pdfkit. The one for nodejs has nothing to do with the one for Python. The ones for Python and Ruby are an interface for wkhtmltopdf. Thanks for replying anyway! – jllopezpino May 27 '17 at 19:22
1

It also possible to set charset in options. This way you don't have to alter the HTML file - especially if you're not the one creating it, and you don't want to mess with it.

def get_options():
    return {
        'encoding': 'UTF-8',
        'enable-local-file-access': True
    }
pdfkit.from_string(html, verbose=True, options=get_options(), configuration=_pdfkit_config)
user2793390
  • 741
  • 7
  • 29