How to generate a PDF with non-ascii characters using from_string from python-pdfkit

Question

I'm struggling to generate just a simple PDF with non-ascii characters using Python 3.5.2, python-pdfkit and wkhtmltox-0.12.2.

This is the easiest example I could write:

import pdfkit
html_content = u'<p>ö</p>'
pdfkit.from_string(html_content, 'out.pdf')

This is like the output document looks like:

score 33 · Accepted Answer · answered May 27 '17 at 19:26

33

I found out that I just needed to add a meta tag with charset attribute to my HTML code:

import pdfkit

html_content = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
</head>
<body>
    <p>&euro;</p>
    <p>áéíóúñö</p>
<body>
</html>
"""

pdfkit.from_string(html_content, 'out.pdf')

I actually spent quite some time following wrong solutions like the one suggested here. In case someone is interested, I wrote a short story on my blog. Sorry for the SPAM :)

answered May 27 '17 at 19:26

jllopezpino

868
2
9
17

I'm glad that it helped @VishnuYS ! – jllopezpino Sep 28 '17 at 14:50
1

This only works if you are using pdfkit.from_string(), I was using pdfkit.from_file() and didn't work. – Cristóbal Felipe Fica Urzúa Aug 10 '18 at 18:31
1

I was trying to extract the html from an email and generate a PDF from it and even though I had a proper unicode string with accents as the "input" for pdfkit.from_string, the ouput had messed up encoding. Manually adding the meta charset attribute to the html code (in the unicode string) works like a charm and pdfkit generates a proper PDF with accents and other non ascii characters. Most of the times the is empty so I do a simple replace, otherwise I insert it after (if it's absent) or append to it (if it's present and not empty). Thanks @jllopezpino – Jeb Dec 11 '18 at 16:40
1

Works perfect! Thanks! (I am using from_string) – Marcus Oct 22 '19 at 21:46
Please, how to make it work with `pdfkit.from_file()`? – GitHunter0 Aug 18 '21 at 03:08

score 1 · Answer 2 · answered May 27 '17 at 09:30

1

There is a relevant issue in pdfkit project https://github.com/devongovett/pdfkit/issues/470 that says

"You need to use an embedded font. The built-in fonts have a limited character set available."

An answer to this question How to: output Euro symbol in pdfkit for nodejs gives a clue how to do it.

answered May 27 '17 at 09:30

piokuc

25,594
11
72
102

1

I realised that there are 3 projects called pdfkit. The one for nodejs has nothing to do with the one for Python. The ones for Python and Ruby are an interface for wkhtmltopdf. Thanks for replying anyway! – jllopezpino May 27 '17 at 19:22

score 1 · Answer 3 · answered Mar 17 '22 at 20:11

1

It also possible to set charset in options. This way you don't have to alter the HTML file - especially if you're not the one creating it, and you don't want to mess with it.

def get_options():
    return {
        'encoding': 'UTF-8',
        'enable-local-file-access': True
    }
pdfkit.from_string(html, verbose=True, options=get_options(), configuration=_pdfkit_config)

answered Mar 17 '22 at 20:11

user2793390

741
7
29

this is not working as you said if not altering the original HTML – Nwawel A Iroume Aug 03 '22 at 10:02

How to generate a PDF with non-ascii characters using from_string from python-pdfkit

3 Answers3

Linked