28

Im trying to create pdf with correct characters, but there are "?" chars. I created a test php file, where Im trying to fing the best solution. If Im open in the browser the html I looks like ok

UTF-8 --> UTF-8 : X Ponuka číslo € černý Češký 

But when I look into the pdf I see this

UTF-8 --> UTF-8 : X Ponuka ?íslo € ?erný ?ešký 

Here is my all code:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>č s š Š</title>
</head>
<body>
<?php 

require_once("dompdf/dompdf_config.inc.php");
$tab = array("UTF-8", "ASCII", "Windows-1250", "ISO-8859-2", "ISO-8859-1", "ISO-8859-6", "CP1256"); 
$chain = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <style></style><title>č s š Š</title></head><body>';
foreach ($tab as $i) 
    { 
        foreach ($tab as $j) 
        { 
            $chain .= "<br> $i --> $j : ".iconv($i, $j, 'X Ponuka číslo € černý Češký <br>'); 
        } 
    } 
$chain .= '<p style="font-family: firefly, verdana, sans-serif;">??????X Ponuka číslo € černý Češký <br></p></body></html>';
echo $chain; 
echo 'X Ponuka číslo € černý Češký <br>'; 

$filename = 'pdf/_1.pdf';
$dompdf = new DOMPDF();
$dompdf->load_html($chain, 'UTF-8');
$dompdf->set_paper('a4', 'portrait'); // change these if you need to
$dompdf->render();
file_put_contents($filename, $dompdf->output());

?> 
</body>
</html>

What Im doing wrong? I tried many many options which I found :( Any idea?

hakre
  • 193,403
  • 52
  • 435
  • 836
lostika
  • 379
  • 1
  • 5
  • 15
  • Most libraries do not allow you to load data in a different encoding than the one you tell explicitly the library to load. This often results in the questions marks then. So I actually wonder why you really think that with DOMPDF this should be different? Also just trying through all options can be okay for playing around, but if that play does not give any results quick, you need to find a different strategy to understand what is going on. – hakre May 05 '13 at 13:11
  • I made several options, because it was hard to find out how does it works, the charset ISO-8859-2 there is not any usable info about it, I googled lot, and I wanted UTF-8, where every char is ok! – lostika May 05 '13 at 14:11
  • 1
    Yes, UTF-8 is a good choice if you want to support all (on computer systems) known characters. However in your code above, you do multiple encodings in the *same* string. That can never work out well. Instead it's better to find out which encoding your strings originally have. And then with the specific encoding convert into UTF-8. You should only do a single re-encoding here. This answer might be interesing for you as well: http://stackoverflow.com/a/5159071/367456 – hakre May 05 '13 at 14:13

11 Answers11

52

You should read over the Unicode How-to again. The main problem is that you don't specify a font that supports your characters. It looks like you've read the how-to, because you're using the font example from that document. However the example was not meant to apply globally to any document, dompdf doesn't include firefly (a Chinese character font) or Verdana by default.

If you do not specify a font then dompdf falls back to one of the core fonts (Helvetica, Times Roman, Courier) which only support Windows ANSI encoding. So always be sure to style your text with a font that supports Unicode encoding and has the characters you need to display.

With dompdf 0.6.0 you can use the included Deja Vu fonts. So the following should work (just the HTML):

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style>
  body { font-family: DejaVu Sans, sans-serif; }
</style>
<title>č s š Š</title>
</head>
<body>
  <p>??????X Ponuka číslo € černý Češký <br></p>
</body>
</html>
BrianS
  • 13,284
  • 15
  • 62
  • 125
  • What version of dompdf? The DejaVu fonts were only included starting with 0.6.x. Also, multiple things can affect the output. E.g., your document should actually be encoded as UTF-8 as well as specifying that encoding in the header. – BrianS Nov 20 '14 at 14:29
  • version was 0.6.1 ``and font was set withing css and body tag: `font-family: Helvetica,"Times New Roman", serif;` – andreas-supersmart Nov 21 '14 at 08:53
  • 1
    @andreas-manusm you'll need to use the DejaVu fonts if you use the character directly. The built-in fonts should be able to display the character if you encode it as `€` (the Windows ANSI character position). – BrianS Nov 25 '14 at 19:31
  • Meanwhile I fixed this by writing "Euro", i was using '€' before – andreas-supersmart Nov 27 '14 at 08:28
  • Thanks for pointing me to use the DejaVu fonts - this time I had a precise template to fullfill. Best practice for next project is creating a template/design based on DejaVu font. – andreas-supersmart Nov 27 '14 at 08:43
  • This is working fine in latest dompdf (v0.7.0-beta2) downloaded from https://github.com/dompdf/dompdf/tags . – Xdg Oct 15 '15 at 18:27
  • I was searching 3 days for a solution, before i found this and now it works perfectly. THANK YOU SO MUCH !!! – era-net Sep 15 '19 at 01:31
  • 1
    @BrianS How can I add my own font? – Alireza A2F Jan 27 '20 at 21:14
42

I got UTF-8 characters working with this combination. Before you pass html to DOMpdf, make encoding covert with this:

$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');

Use DejaVu font in your css

*{ font-family: DejaVu Sans; font-size: 12px;}

Make sure you have set utf-8 encoding in HTML <head> tag

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Now all special characters are working "ľ š č ť ž ý á í é"

Ejaz
  • 8,719
  • 3
  • 34
  • 49
Frantisek
  • 568
  • 5
  • 8
26

Only Add

  <style>
    *{ font-family: DejaVu Sans !important;}
  </style>

before </head> It is working for me.

Prasant Kumar
  • 1,008
  • 12
  • 13
  • Also SET def("DOMPDF_ENABLE_HTML5PARSER", false); to def("DOMPDF_ENABLE_HTML5PARSER", true); in dompdf_config.inc.php file. – Prasant Kumar Feb 21 '18 at 06:59
3

Dompdf does not support fallback fonts, so you can't use your favorite font if it does not support your characters, and you also can't set another font to be the fallback font for those characters like droid sans fallback.

What you can do instead is take advantage of regex unicode script ranges: https://www.regular-expressions.info/unicode.html to wrap those blocks of text into spans and give them the fallback font.

Example:

$body = 'test 简化字 彝語/彝语 test číslo € černý Češký';

$cjk_scripts = 'Bopomofo|Han|Hiragana|Katakana';
$cjk_scripts = preg_replace('/[a-zA-Z_]+/', '\\p{$0}', $cjk_scripts);

// wrap the CJK characters into a span with it's own font
$body = preg_replace("/($cjk_scripts)+/isu", '<span class="cjk">$0</span>', $body);

// a font that supports CJK characters
$cjk_font_path = APP_PATH.'/fonts/DroidSansFallbackFull.ttf';

$html = <<<HTML
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style type="text/css">
@font-face {
    font-family: 'DroidSansFallbackFull';
    font-style: normal;
    font-weight: 400;
    src: url('$cjk_font_path') format('truetype');
}
body {
    font-family: DejaVu Sans, sans-serif;;
}
.cjk {
    font-family: DroidSansFallbackFull, sans-serif;
}
</style>
</head>
<body>$body</body>
</html>
HTML;

$dompdf = new \DOMPDF();
$dompdf->set_paper('A4');
$dompdf->load_html($html);
$dompdf->render();

$dompdf->stream('test.pdf', ['Attachment'=>0]);

Related: https://github.com/dompdf/dompdf/issues/1508

Timo Huovinen
  • 53,325
  • 33
  • 152
  • 143
2

utf8_decode() did the trick for me with some German translations like ä and ü.

echo utf8_decode('X Ponuka číslo € černý Češký <br>');
1

Nothing out of mentioned answers helped me. After hours of struggle I switched to niklasravnsborg/laravel-pdf has nearly exactly the same syntax and usage, and everything is working allright.

Fusion
  • 5,046
  • 5
  • 42
  • 51
1

If you don't mind having only one charset you can change every charset in dompdf_font_family_cache.dist.php

just like

<?php
$distFontDir = $rootDir . DIRECTORY_SEPARATOR . 'lib' . DIRECTORY_SEPARATOR . 'fonts' . DIRECTORY_SEPARATOR;
return array(
    'sans-serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'times' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'times-roman' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'courier' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'helvetica' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'zapfdingbats' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'symbol' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'monospace' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'fixed' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'dejavu sans' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'dejavu sans mono' =>
    array(
        'bold' => $distFontDir . 'DejaVuSansMono-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSansMono-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSansMono-Oblique',
        'normal' => $distFontDir . 'DejaVuSansMono'
    ),
    'dejavu serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSerif-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSerif-BoldItalic',
        'italic' => $distFontDir . 'DejaVuSerif-Italic',
        'normal' => $distFontDir . 'DejaVuSerif'
    )
)
?>

I know it's not the best way, but it saves lot of time

pacholik
  • 8,607
  • 9
  • 43
  • 55
David Škarda
  • 40
  • 1
  • 5
1

Chinese characters are causing problems sometimes. The important part is to have good font here is a list you can download.

I chose first named "Kai Bold Font" here is a download page

Then put it on your hosting service in a public folder. I put it into

http://192.168.10.10/fonts/pdf/wts11.ttf

and here is my html example

$html = <<<EOT
<!DOCTYPE html>
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
   <style>
    @font-face {
      font-family: chinese;
        src: url('http://192.168.10.10/fonts/pdf/wts11.ttf') format('truetype');
    }
    .chineseLanguage { font-family: chinese; }
      body {font-family: DejaVu Sans, sans-serif;}
   </style>
</head>
<body>
    Chinese
    <div class='chineseLanguage'>
        忠烈祠
        中文 - 这工作<br> 
    </div>
    hello world <br> 
    Russian - русский текст <br>
    Greek - α,β,γ,δ,ε <br>
    chars - !@#$%^&* -=- €   <br><br>
    <br>
    Hebrew (iw)<br><br>
    דג סקרן שט בים מאוכזב ולפתע מצא לו חברה איך הקליטה<br>
    <br>    
</body>
</html>
EOT;

PS. there is a little chance you might need this set:

ini_set("allow_url_fopen", true);
Yevgeniy Afanasyev
  • 37,872
  • 26
  • 173
  • 191
0

I had similar problem and ended up using tcpdf.Hope this could be helpful. http://www.tcpdf.org/
Problem was the font i was using.I was able to get the correct output using this font 'freeserif'.I guess it might be possible to get the same output using this font with dompdf.

$pdf->SetFont('freeserif', '', 12);

Here is the sample i have used. tcpdf utf-8 sample

<?php
header('Content-type: text/html; charset=UTF-8') ;//chrome
require_once('tcpdf_include.php');

// create new PDF document
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);

$pdf->setFontSubsetting(true);

$pdf->SetFont('freeserif', '', 12);

$pdf->AddPage();

$utf8text = '
<html><head>  
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body>
<b>Ponuka číslo € černý Češký </b></br>
සිංහල  </br>
<u>தேமல </u> </br>
</body></html>';

$pdf->SetTextColor(0, 63, 127);

$pdf->writeHTML($utf8text, true, 0, true, true);

$pdf->Output('example_008.pdf', 'I');

?>
Deshan
  • 2,112
  • 25
  • 29
0

I had the same problem and I solved it very simple. Just import google fonts with required language subset in your CSS file which is used when generating HTML. Specify utf-8 in your HTML file and it's working...

@import url('https://fonts.googleapis.com/css?family=Roboto:400,700&subset=latin-ext');
body {font-family: 'Roboto', sans-serif;}
general666
  • 1,001
  • 2
  • 16
  • 31
0

Lots of answers here, struggled to get any to provide cross-language support reliably. I believe that for those of us making distributed software, there is also server-setting blocks which stop some functionality such as @import and src:url() in pdfdom automatically working to embed a font.

The following solution has worked across many servers & locally hosted sites, and requires no command line access:

  1. Retrieve font you want to use as a .ttf (for language support including Cyrillic, Greek, Devanagari, Latin, and Vietnamese, we used Noto Sans with all optional languages checked)
  2. Run/build-in the following script and fire PDFBuilder_install_font_family() ONCE only (singular install)

Gist for PDFBuilder_install_font_family(): https://gist.github.com/woodyhayday/f8dc36cc7ec922bc1894f33eb2b0e928

Woody Hayday
  • 97
  • 1
  • 5