25

Below is my Prawn PDF file to generate a name on the PDF -

def initialize(opportunity_application)
  pdf = Prawn::Document.new(:page_size => [1536, 2048], :page_layout => :landscape)
  cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('iso-8859-1').encode('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: "app/assets/fonts/opensans.ttf")

  t = pdf.make_table [[cell_1]]
  t.draw
  pdf.render_file "tmp/mos_certificates/application_test.pdf"
end

When rendering the name Eylül Çamcı which is Turkish, I get the following error -

Prawn::Errors::IncompatibleStringEncoding: Your document includes text that's not compatible with the Windows-1252 character set.
If you need full UTF-8 support, use TTF fonts instead of PDF's built-in fonts.

I'm already using a TTF font that supports the characters in that name, what can I do to print the name correctly?

Michael Victor
  • 861
  • 2
  • 18
  • 42
  • are you following this instructions https://stackoverflow.com/questions/37286976/ruby-how-to-use-different-fonts-in-prawn#37287069 – Fabrizio Bertoglio Sep 07 '17 at 04:50
  • I tried this as well, and it spouted the same error. Here is the gist of what I tried - https://gist.github.com/mikevic/e1617641704aed9d8642b54fb5ea0351 – Michael Victor Sep 08 '17 at 14:48
  • aren't you missing `font "Opensans"`. I checked your gist, in the following post they first updated the font family and create a new one for `"Arial" => { :normal => "/assets/fonts/Arial.ttf", :italic => "/assets/fonts/Arial Italic.ttf", }` then they tell `Prawnpdf` to use that font family with `font "Arial"` – Fabrizio Bertoglio Sep 09 '17 at 05:26

3 Answers3

12

It seams Turkish is missing in iso-8859-1.

On the other hand iso-8859-9 should work.

So you may try to change your code like (check the iso number that I changed):

...
cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('iso-8859-9').encode('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: "app/assets/fonts/opensans.ttf")
...

And a fun link which is not only related with character set but also other internalisation differences for Turkey.


Edit 1: I made a basic check, it seems the text is already in UTF-8. So why need to change to iso-8859 and come back to UTF-8?

Can you please try "Eylül Çamcı".force_encoding('utf-8') alone?

irb(main):013:0> "Eylül Çamcı".encoding
=> #<Encoding:UTF-8>
irb(main):014:0> "Eylül Çamcı".force_encoding('UTF-8')
=> "Eylül Çamcı"
irb(main):015:0>

Edit 2: Also can you check your font path? Both font exists and the path is proper?

#Rails.root.join('app/assets/fonts/opensans.ttf')
cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: Rails.root.join('app/assets/fonts/opensans.ttf'))
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
Mehmet Kaplan
  • 1,723
  • 2
  • 20
  • 43
5

I'm not sure I remember how Prawn works, but PDF files don't support UTF-8, which is the default Ruby encoding for String objects.

In fact, PDF files only support ASCII encoding using internal fonts - any other encoding requires that you bring your own font (which is also recommended for portability).

The workaround is to either use character maps (CMaps) - either custom CMaps or pre-defined ones (BYO font).

Generally, PDF files include an embedded font (or a subset of a font), and a CMap, mapping the value of a byte (or, a number of bytes) to a desired font glyph. i.e. mapping 97, which is 'a' in ASCII, to the å glyph when using the specified font.

Last time I used Prawn, I think it supported TTF fonts and created font maps automatically using UTF-8 Strings for the text input - but you have to load an appropriate font into Prawn and remember to use it!.

You can see an example in this answer.

Good Luck!

EDIT

I updated the answer to reflect @mkl's comments.

@mkl pointed out that other encodings are supported or possible (BYO font), including predefined some multibyte encoding (which use pre-defined CMaps).

Myst
  • 18,516
  • 2
  • 45
  • 67
  • *"In fact, PDF files only support ASCII encoding."* - this simply is wrong. There is a wide palette of possible encodings for fonts in PDFs, both single byte and multi byte. Merely UTF-8 happens not to be among them. – mkl Sep 18 '17 at 04:19
  • @mkl - I'm think you're mistaken. Multi-byte encodings aren't possible in the PDF format and any encoding other than ASCII (with a limited number of built in fonts) requires that you bring your own font and map the glyphs. You might be thinking of the authoring tool rather than the file format. – Myst Sep 18 '17 at 09:50
  • *"requires that you bring your own font"* - but what is the problem about that? Embedding fonts actually is a *necessity* if you want PDFs to be really *portable*. That been said, though, even if only considering the standard 14 fonts there is much more than merely ASCII, please have a look at Annex D of the PDF specification ISO 32000-1 (part 2 has been released this year but I could not compare yet). And beyond those standard 14 fonts, PDF supports many predefined multi-byte encodings (cf. e.g. section 9.7.5 in ISO 32000-1) and an option to built your own encodings. – mkl Sep 18 '17 at 11:11
  • @mkl - I updated my answer to reflect your comments. Let me know if you have further input. – Myst Sep 18 '17 at 11:55
2

From this anwser about Force strings to UTF-8 from any encoding :

"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

str = str.force_encoding("UTF-8")
str.encoding.name # => 'UTF-8'

If you want to perform a conversion, use encode

Indeed, as @MehmetKaplan said:

It seams Turkish is missing in iso-8859-1.

On the other hand iso-8859-9 should work.

Therefore, you won't need the force_encodinganymore but just encode

[37] pry(main)> "Eylül Çamcı".encode('iso-8859-1')
Encoding::UndefinedConversionError: U+0131 from UTF-8 to ISO-8859-1
from (pry):39:in `encode'
[38] pry(main)> "Eylül Çamcı".encode('iso-8859-9')
=> "Eyl\xFCl \xC7amc\xFD"

This mean you have to drop the UTF-8 entirely in your code.

content: "Eylül Çamcı".encode('iso-8859-9'),
Kruupös
  • 5,097
  • 3
  • 27
  • 43
  • I'm still getting the same error :/ Do you think it has something to do with the fonts? I have checked on Google Fonts and Opensans supports the string I am trying with. – Michael Victor Sep 14 '17 at 03:30