1

I'm trying to convert the following HTML to a PDF using wkhtmltopdf, version 0.12.2.1 (with patched qt):

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Title of the document</title>
</head>

<body>
&#x1f60b;
</body>

</html>

The HTML contains the hex character 😋 which shows up fine in the HTML as an emoticon, but in my PDF it looks like this:

pdf hex character

Why is it displayed like that and how can I fix this?

The command I'm using is:

wkhtmltopdf /tmp/test.html /tmp/foo.pdf

Michiel Borkent
  • 34,228
  • 15
  • 86
  • 149
  • Probably what you see in the PDF is the surrogate pair used to encode that emoticon in UTF-16 rendered as 2 glyphs. wkhtmltopdf may not support Unicode characters above U+FFFF. – roeland Nov 25 '15 at 00:02
  • What happens if you replace the HTML entity with UTF-8 bytes? – user193661 Nov 25 '15 at 04:16
  • @user193661Then it works. – Michiel Borkent Nov 25 '15 at 08:25
  • @MichielBorkent Can you share specifics on how you used UTF-8 bytes? Did you have to use any flags on `wkhtmltopdf` to get it to work? – strange quark Dec 18 '15 at 23:50
  • @strangequark Not that I can remember – Michiel Borkent Dec 19 '15 at 09:23
  • @MichielBorkent I understand that this post is a bit old, but can you help explain in more details what you did to fix the problem? I'm not sure how to "replace the HTML entity with UTF-8 bytes", you already have . Thanks! – cinny Sep 09 '18 at 19:33
  • Sorry, I don't remember what I did there and I can't find it in the code I used around that time. In newer projects I started using https://github.com/arachnys/athenapdf. – Michiel Borkent Sep 10 '18 at 08:11
  • See https://stackoverflow.com/questions/33901625/smiley-emoticon-showed-as-weird-character-in-pdf-made-with-wkhtmltopdf – Rajan Nov 27 '22 at 15:14

1 Answers1

2

For someone encountering this now, I'm using the Windows binary 0.12.2.3 (on Windows 10 1809) and managed to get wkhtmltopdf rendering emoticons by setting the font family to a local Windows font-family that recognizes fonts:

For a quick test, insert this into HTML page and regenerate your PDF:

<style>
    body {
        font-family: "Noto Color Emoji", "Apple Color Emoji", "Segoe UI Emoji",
            Times, Symbola, Aegyptus, Code2000, Code2001, Code2002, Musica, serif,
            LastResort;
        }
</style>

You should see your emojis being rendered now, but you will likely have to tailor the above CSS to your own app.

To know which font family to use, I went to Full Emoji List here: https://unicode.org/emoji/charts/full-emoji-list.html

Then I right clicked an emoji that was being natively rendered by Chrome and copied the font family.

Steve Bauman
  • 8,165
  • 7
  • 40
  • 56
  • That works, but it forces you to use the Emoji font for the entire document. Where the HTML will render with the primary font and render the Emoji, this will use only the Emoji font. Odd that the Chromium rendering in wkhtml2pdf doesn't pick up the same CSS hierarchy as the raw HTML does. – Rick Strahl Jul 28 '21 at 05:38
  • @RickStrahl "You should see your emojis being rendered now, but you will likely have to tailor the above CSS to your own app." – Steve Bauman Jul 28 '21 at 13:22
  • The problem is that I usually use the primary fonts first and add the Emoji fonts at the end. This renders the primaries as usual, and emojis from the emoji fonts. Not sure how the browsers manage that, but it works in Chromium/FireFox browsers. In wkhtml2pdf it doesn't work with the emoji fonts at the end of the font list - which means you have to put them in the front as the primary font. Need to check latest though. – Rick Strahl Jul 28 '21 at 17:17