1

Possible Duplicate:
Using PDFBox to write UTF-8 encoded strings to a PDF

I need to create PDF with Czech national characters, and I'm trying to do it with PDFBox library. I have copied following code from some tutorials:

public void doIt(String file, String message) throws IOException, COSVisitorException
{
    PDDocument doc = null;
    try
    {
        doc = new PDDocument();
        PDSimpleFont font = PDType1Font.TIMES_ROMAN;

        TextToPDF textToPdf = new TextToPDF();

        textToPdf.setFont(font);
        textToPdf.setFontSize(12);
        doc = textToPdf.createPDFFromText(new StringReader(message));
        doc.save(file);
    }
    finally
    {
        if( doc != null )
        {
            doc.close();
        }
    }
}

Now, I'am calling function doIt:

app.doIt("test.pdf", "Skákal pes přes oves, přes zelenou louku.");

This completely works, but in output PDF I get: "þÿSkákal pes pYes oves, pYes zelenou louku."

I tried to find how to set UTF-8 encoding in PDFBox, but IMHO there is just no solution for this on the internet.

Do you have any ideas, how to get right text in output PDF?

Thank you.

Community
  • 1
  • 1
Firzen
  • 1,909
  • 9
  • 28
  • 42
  • Please look for this question already being answered elsewhere on stackoverflow before asking. http://stackoverflow.com/questions/5425251/using-pdfbox-to-write-utf-8-encoded-strings-to-a-pdf – durron597 Nov 07 '12 at 17:22
  • I already looked at this. It looks great, but character "š" is the only character that I am able to write into PDF with right encoding by its escape code. Character "š" has double different codes in UTF-8 table, and only one of them works. (more there: http://doc.infosnel.nl/extreme_utf-8.html) Unfortunately all other characters I need has only one code in UTF-8 table, and these codes does not work. And there are even more problems with escape codes.. – Firzen Nov 07 '12 at 20:06
  • Did you try the answer about embedding Gentium or Doulus fonts? – durron597 Nov 07 '12 at 20:23
  • No I didn't. I will use iText for this thing. Although I think that these fonts will not solve enything because in Czech republic we normally use fonts like Times new roman, Arial, etc.. But thank you for your help. – Firzen Nov 07 '12 at 21:14

1 Answers1

1

I think its PDType1Font.TIMES_ROMAN font which is not supporting your Czech national characters. If you can manage to get the .ttf files for the Czech national characters, then use below to get PDFont as below and use the same:

      PDFont font = PDTrueTypeFont.loadTTF( doc, new File( "CheckRepFont.ttf" ) );

Here CheckRepFont.ttf is your font file name as an example. Update it with actual one.

EDIT:

  PDStream pdStream  = new PDStream(doc);
  PDSimpleFont font = PDType1Font.TIMES_ROMAN;
  font.setToUnicode(pdStream);
Yogendra Singh
  • 33,927
  • 6
  • 63
  • 73
  • I have already tried this, but it is even worse. I believe that this problem is not caused by missing Czech characters in fonts - characters are in these fonts. – Firzen Nov 07 '12 at 19:27
  • @user1735603 See if `setToUnicode` method in `PDSimpleFont` helps? – Yogendra Singh Nov 07 '12 at 19:40
  • 1
    Thank for your help. I'm trying to use setToUnicode, but it wants argument of type "COSBase". And I have absolutely no idea how to use it in this situation.. (this also means, that font.setToUnicode(pdStream) does not work for me) – Firzen Nov 07 '12 at 19:57
  • @user1735603 `COSBase= The base object that all objects in the PDF document will extend`. Try setting your document object itself. Please look at my updated answer(I updated around 1/2 hour back). – Yogendra Singh Nov 07 '12 at 20:07
  • It is strange, but your code really does not work for me. I can't compile it. What version of PDFBox you use? I am using version 1.7.1 – Firzen Nov 07 '12 at 20:25
  • @user1735603: Check this. [PDSimpleFont.html#setToUnicode](http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/font/PDSimpleFont.html#setToUnicode(org.apache.pdfbox.pdmodel.common.PDStream)) and [PDStream (PDDocument)](http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/common/PDStream.html#PDStream(org.apache.pdfbox.pdmodel.PDDocument)). – Yogendra Singh Nov 07 '12 at 20:29
  • OMG that is just unbelievable. The documentation it wrong? That says compiler: Exception in thread "main" java.lang.Error: Unresolved compilation problem: The method setToUnicode(COSBase) in the type PDSimpleFont is not applicable for the arguments (PDStream) PDFBox seems as really funny library. I am new to this library, but that just suck. – Firzen Nov 07 '12 at 20:38
  • @user1735603: Why don't you switch to `iText`, which I think is better than this specially for UniCode handling with better documentation and features? – Yogendra Singh Nov 07 '12 at 20:46
  • Yes, that seems as final solution for me. I have downloaded iText, and it works with national characters wihout any problems. Thank you very much! – Firzen Nov 07 '12 at 21:10