iText, unicode characters and Java

Question

I have a text editing program that saves it's output into a PDF file.

It also saves all the text into a PDF dictionary from which it can be read back again. The problem is that in my native language characters like č,ć,đ,ž,š are pretty common...

When I write those characters in my programs GUI, it's fine, they all apear (I'm (currently) using java's arial font).

When I save it and open the PDF I AdobeReader, č i ć are missing and đ, ž and š are printed as they should. I am using a custom (truetype) font (BookAntiqua, downloaded from here.

Is this a problem in encoding, or in the font itself (that it does not support the đ, ž, š characters)?

Also, when I load the PDF into my program again, missing letters are still missing and đ, ž, š are swaped with ⎕ symboles and similar... Is that a problem in how PDF is written or is there something else?

Bottom line is, I'd like that those five charactes are visible in my programs GUI and in PDF document and that they are retrieved properly from PDF's dictionary.

score 3 · Accepted Answer · edited May 23 '17 at 12:24

3

I had the same problem. I have solved with changing a font which support those letters, forgot the name, maybe Arial unicode? - need to experiment what font is that- than I have embedded that font into pdf. That was perfect. My name has some of those strange characters :)

Edit: There is a sample here how to do it, and set it for fields too, and here another code snippet, which can be helpfully.

edited May 23 '17 at 12:24

Community

1
1

answered Sep 22 '12 at 11:36

I know how to emmbed the font into PDF, that's not the problem, it seems that the problem is in the font I'm using... I've been looking for a while now... Could you recommend me any fonts that resemble BookAntiqua or TimesNewRoman and which also support unicode characters? – Ivan Karlovic Sep 22 '12 at 12:11
@IvanKarlovic no, I forgot the name, it was like 6 or more years ago. For sure if you change your font to a correct one and embed into pdf will solve the problem, but that you will need to google it. Here it is a list http://en.wikipedia.org/wiki/Unicode_font – Sep 22 '12 at 12:26
I found the font and it did help, but it did not solve my problem. text read from PDF still doesn't print those characters. I even tryed this: `public static String unicodize(String string) { string.replace("Ć", "/u0106"); string.replace("ć", "/u0107"); string.replace("Č", "/u010C"); string.replace("č", "/u010D"); string.replace("Đ", "/u0110"); string.replace("đ", "/u0111"); string.replace("Š", "/u0160"); string.replace("š", "/u0161"); string.replace("Ž", "/u017D"); string.replace("ž", "/u017E"); return string; }` – Ivan Karlovic Sep 22 '12 at 12:56
maybe "Ć", "\u0106" will fix? :) '/u' vs '\u' Inverse the slash in the public static String unicodize function – Sep 22 '12 at 13:14

iText, unicode characters and Java

1 Answers1