4

I am trying to convert an HTML file to pdf using pdfkit python library. I followed the documentation from here.

Currently, I am trying to convert plain texts to PDF instead of whole html document. Everything is working fine but instead of text, I am seeing boxes in the generated PDF. This is my code.

import pdfkit
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf/wkhtmltox/bin/wkhtmltopdf')
content = 'This is a paragraph which I am trying to convert to pdf.'
pdfkit.from_string(content,'test.pdf',configuration=config)

This is the output.

Boxes are being displayed instead of plain text.

Instead of the text 'This is a paragraph which I am trying to convert to pdf.', converted PDF contains boxes.

Any help is appreciated. Thank you :)

Nandan Bhat
  • 1,573
  • 2
  • 9
  • 21
  • @アレックス I could solve this. It happened because of the fonts. – Nandan Bhat Feb 28 '18 at 09:22
  • 1
    I downloaded the fonts ( ttf files ) and placed in "/usr/share/fonts/" and rebooted. And it worked. "ms-pgothic", "hiragino-kaku-gothic-pro-w3" are the fonts I used. – Nandan Bhat Feb 28 '18 at 09:27

2 Answers2

1

Unable to reproduce the issue with Python 2.7 on Ubuntu 16.04 and it works fine on the specs mentioned. From my understanding this problem is from your Operating System not having the font or encoding in which the file is being generated by the pdfkit.

Maybe try doing this:

import pdfkit
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf/wkhtmltox/bin/wkhtmltopdf')
content = 'This is a paragraph which I am trying to convert to pdf.'
options = {
    'encoding':'utf-8',
}
pdfkit.from_string(content,'test.pdf',configuration=config, options=options)

The options to modify pdf can be added as dictionary and assigned to options argument in from_string functions. The list of options can be found here.

Shubham Mishra
  • 341
  • 1
  • 9
  • I tried your suggestion. But it didn't help. But whatever you told earlier about the font makes sense. Can you tell me in detail about that ? I am running the code on Python 3.6 – Nandan Bhat Feb 15 '18 at 12:19
  • if the font that pdfkit is writing in the pdf file is not installed on your machine, the applications (like acrobat reader) won't be able to display them and show boxes as placeholder. Having said that, pdfkit uses Aerial as default font which I am sure must be present on your machine, and in case its not do install it. Edit: I tried reproducing with python 3.6 too and it works for me still. – Shubham Mishra Feb 15 '18 at 12:44
  • Hmmm. I am using linux and I am not sure how check installed fonts. – Nandan Bhat Feb 15 '18 at 12:59
  • If you are using Ubuntu (debian) machine you can probably try this `sudo apt-get install ttf-mscorefonts-installer` `sudo fc-cache` and check for Arial font with `fc-match Arial` – Shubham Mishra Feb 16 '18 at 06:18
  • 1
    I downloaded the fonts ( ttf files ) and placed in "/usr/share/fonts/" and rebooted. And it worked. "ms-pgothic", "hiragino-kaku-gothic-pro-w3" are the fonts I used. – Nandan Bhat Feb 28 '18 at 09:27
1

This issue is referred here Include custom fonts in AWS Lambda

if you are using pdfkit on lambda you will have to setup ENV variables as "FONT_CONFIG_PATH": '/opt/fonts/' "FONTCONFIG_FILE": '/opt/fonts/fonts.conf'

if this problem is in the local environment a fresh installation of wkhtmltopdf must resolve this

parth_sh
  • 69
  • 5