12

I'm trying to create a pdf from a html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems. My html page contains a calendar emoji that displays well in the browser whatever font I use

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta http-equiv='content-type' content='text/html; charset=utf-8' />
  <style>
  unicode {
     font-family: 'OpenSansEmoji', sans-serif;
  }
  @font-face {
     font-family: 'OpenSansEmoji';
     src: url(data:font/truetype;charset=utf-8;base64,<-- encoded_font_base64_string-->) format('truetype');
  }
 </style>
 </head>
 <body>
 <div><unicode>&#128197;</unicode></div>
 </body>
 </html>

However, when I try to generate the PDF using the WickedPdf.new.pdf_from_html_file method of the gem in the rails console,

 File.open(File.expand_path('~/<--pdf_filename-->.pdf'), 'wb+') {|f| f.write  WickedPdf.new.pdf_from_html_file('<--absolute_path_of_html_file-->')}  

I get the following result:

PDF result with unknown character

As you can see, the first calendar icon is properly displayed, however there is a second character that is displayed, we do not know where it's coming from.

I have investigated through encoding in UTF-8 and UTF-16 and surrogate pair as suggested by this related post stackoverflow_emoji_wkhtmltopdf and looked at this issue wkhtmltopdf_git_issue but still can't make this character disappear!

If you have any clue, it's more than welcome.

Thanks in advance for your help!

EDIT

Following the comments from Eric Duminil and petkov.np, I can confirm - the code above works for me properly on Linux. Seems like this is a Linux vs MacOS issue. Can anyone suggest what the core of the issue in MacOS binding and whether it can be fixed?

Community
  • 1
  • 1
rico1892
  • 123
  • 1
  • 6
  • It works just fine with your html and ruby code. gem list | grep pdf : pdf-core (0.6.1) pdf-inspector (1.2.1) pdf-reader (1.4.0) wicked_pdf (1.1.0) wkhtmltopdf-binary (0.12.3.1) ruby -v: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux] On linux mint 17 – Eric Duminil Jan 14 '17 at 22:52
  • @EricDuminil I have tested on Linux environment and it works. Just edited my question – rico1892 Jan 16 '17 at 13:12

1 Answers1

3

I've edited this answer several times, please see the notes at the end as well as the comments.

I'm using macOS 10.12.2 and have the same issue. I'm listing all the browser etc. versions, although I suspect the biggest factor is the OS/wkhtmltopdf build.

  • Chrome: Version 55.0.2883.95 (64-bit)
  • Safari: Version 10.0.2 (12602.3.12.0.1)
  • wkhtmltopdf: 0.12.3 (with patched qt)

I'm using the following example snippet:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html" charset="utf-8">
    <style type="text/css">
      p {
        font-family: 'EmojiSymbols', sans-serif;
      }
      @font-face {
        font-family: 'EmojiSymbols';
        src: local('EmojiSymbols-Regular.woff'), url('EmojiSymbols-Regular.woff') format('woff');
      }

      span:before {
        content: '\01F60B';
      }
    </style>
  </head>
  <body>
    <p>
      
      <span></span>
      &#x1F60B;
      &#128523;
      &#xf0;&#x9f;&#x98;&#x8b;
    </p>
  </body>
</html>

I'm calling wkhtmltopdf with the --encoding 'UTF-8' option.

You can see the rendered result here (I'm sorry for the lame screenshot). Some brief conclusions:

  1. Safari doesn't render the 'raw' UTF-8 bytes properly. It seems to treat them just as the raw byte sequence (last line in the html paragraph). Safari renders everything fine.
  2. Chrome renders everything fine.
  3. With the above option, wkhtmltopdf renders the raw bytes (sort of) ok, but doesn't render the CSS content attribute properly. Every 'proper' occurrence of the unicode symbol is followed by this strange phantom symbol.

I've tried literally everything but the results are the same. For me, the fact that even Safari doesn't render the raw bytes properly indicates some system-level problem that is macOS specific. It's unclear to me wether this should be reported as a wkhtmltopdf issue or there is some misbehaved dependency in the macOS build.

EDIT: Safari seems to work fine, my markup was broken.

EDIT: A CSS workaround may do the trick, please check the comments below.

FINAL EDIT: As shown in the comments, the CSS 'hack' that solves the issues is using text-rendering: optimizeLegibility;. This seems to only be needed on macOS/OS X.

From my comment below:

I just found this issue. It seems irrelevant at first glance, but adding text-rendering: optimizeLegibility; to my styles removed the duplicate characters (on macOS). Why this happens is beyond me. As the issue author also uses osx, it's apparent there is some problem withwkhtmltopdf builds for this os.

Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
petkov.np
  • 511
  • 3
  • 4
  • petkov.np can you try putting a Base64 inline like the OP does? E.g.: `<%= Base64.strict_encode64(Rails.application.assets['EmojiSymbols-Regular.woff'].source) %>` – AmitA Jan 16 '17 at 10:24
  • I've tried base64-encoding the font as well, it doesn't make a difference in the rendered pdf. – petkov.np Jan 16 '17 at 11:42
  • Just tried rendering with `wkhtmltopdf` 0.12.2.4 on Ubuntu 16.04 and the result is as expected. – petkov.np Jan 16 '17 at 12:28
  • @petkov.np I have just tested my code on Linux environment and it actually works. It might be a MacOS binding issue. Just added this remark to my question – rico1892 Jan 16 '17 at 13:15
  • 3
    I just found [this issue](https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1734). It seems irrelevant at first glance, but adding `text-rendering: optimizeLegibility;` to my styles removed the duplicate characters (on macOS). *Why* this happens is beyond me. As the issue author also uses osx, it's apparent there is some problem with `wkhtmltopdf` builds for this os. – petkov.np Jan 16 '17 at 16:39
  • It solved it! The unknown character has disappeared. I will try to investigate why but thks anyway. – rico1892 Jan 16 '17 at 17:07
  • @petkov.np It worked for us !! Can you edit your answer so I can give you the bounty ? I believe it's gonna help a lot of people in the same case. Also, if you have more informations on what's the fix, how you found it, it would be awesome ! Thanks ! – Erowlin Jan 16 '17 at 17:10
  • Glad I was able to help. `wkhtmltopdf` issues tend to be really frustrating. – petkov.np Jan 16 '17 at 17:37
  • Also, if you have more informations on what's the fix, how you found it and how it works, it would be awesome ! Thanks ! – Erowlin Jan 16 '17 at 18:07
  • I don't have additional info really, just searched through all issues related to Unicode and OS X in the project's repo. I've gathered everything relevant here. – petkov.np Jan 16 '17 at 19:00
  • the source issue in [github](https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1734), still couldn't find any actual solution for it. – Gokul May 22 '18 at 17:58