I created a PDF document with the Apache PDFBox library. My problem is to encode the euro currency symbol when drawing a string on the page, because the base font Helvetica does not provide this character. How I can convert the output "þÿ ¬" to the symbol "€"?.
-
That's not a font problem, it's a character encoding problem. One application is writing the Euro symbol using one encoding, and another application is reading it under the assumption that it's using a different character encoding. What software are you using to write and run your code? – Bobulous Mar 07 '14 at 20:49
-
Thanks @Arkanon.. I write and run with eclipse. It's possible set the type encoding on pdfbox or nothing to do? – Carmine Mar 07 '14 at 21:07
-
i run into a similar problem when i try to display arabic caracters. http://stackoverflow.com/questions/26039280/write-arabic-caracters-with-pdfbox – Genjuro Sep 25 '14 at 15:58
4 Answers
Unfortunately PDFBox's String encoding is far from perfect yet (version 1.8.x). Unfortunately it uses the same routines when encoding strings in generic PDF objects as when encoding strings in content streams which is fundamentally wrong. Thus, instead of using PDPageContentStream.drawString
(which uses that wrong encodings), you have to translate to the correct encoding yourself.
E.g. instead of using
contentStream.beginText();
contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
contentStream.setFont(PDType1Font.HELVETICA, 2);
contentStream.drawString("€");
contentStream.endText();
contentStream.close();
which results in
you could use some like
contentStream.beginText();
contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
contentStream.setFont(PDType1Font.HELVETICA, 8);
byte[] commands = "(x) Tj ".getBytes();
commands[1] = (byte) 128;
contentStream.appendRawCommands(commands);
contentStream.endText();
contentStream.close();
resulting in
If you wonder how I got to use 128 as byte code for the €, have a look at the PDF specification ISO 32000-1, annex D.2, Latin Character Set and Encodings which indicates an octal value 200 (decimal 128) for the € symbol in WinAnsiEncoding.
PS: An alternative approach has meanwhile been presented by other answers which in case of the € symbol amounts to something like:
contentStream.beginText();
contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
contentStream.setFont(PDType1Font.HELVETICA, 8);
contentStream.drawString(String.valueOf(Character.toChars(EncodingManager.INSTANCE.getEncoding(COSName.WIN_ANSI_ENCODING).getCode("Euro"))));
contentStream.endText();
contentStream.close();
This indeed also draws the '€' symbol. But even though this approach looks cleaner (it does not use byte
arrays, it does not construct an actual PDF stream operation manually), it is dirty in its own way:
To use a broken method, it actually breaks its string argument in just the right way to counteract the bug in the method.
Thus, if the PDFBox people decided to fix the broken PDFBox method, this seemingly clean work-around code here would start to fail as it would then feed the fixed method broken input data.
Admittedly, I doubt they will fix this bug before 2.0.0 (and in 2.0.0 the fixed method has a different name), but one never knows...

- 90,588
- 15
- 125
- 265
-
Excellently written and detailed answer, but do the glyph graphics need to be so ginormous? – Bobulous Mar 09 '14 at 15:27
-
1
-
4Very true. But I fear the Euro symbol above might just be visible from the International Space Station. – Bobulous Mar 09 '14 at 17:29
-
Thanks, you freak. This is a good solution, but I having read data from xml, which have the text with this symbol, how to encoding within the text. @Arkanon: mkl did it to show it well from my house... – Carmine Mar 09 '14 at 21:48
-
In the referenced pdf specification annex you'll find tables describing (among others) the encoding which pdf calls **WinAnsiEncoding**. Using these information you can map your text data to the Bytes to use. – mkl Mar 09 '14 at 22:59
-
i don't quitte get the meaning of "(x) Tj " could you please explain , as i'm facing similar problem with RTL left language PDFbox printing – Genjuro Sep 25 '14 at 15:56
-
**(x) Tj** means "from the current text position on draw left-to-right a sequence of glyphs represented by that byte sequence *x*. The meaning of the individual bytes in the byte sequence is given by the encoding of the current font. For rtl writing you'll have to write backwards. – mkl Sep 25 '14 at 21:55
This worked for me:
char symbol = '€';
Encoding e = EncodingManager.INSTANCE.getEncoding(COSName.WIN_ANSI_ENCODING);
String toPDF = String.valueOf(Character.toChars(e.getCode(e.getNameFromCharacter(symbol))));`

- 4,422
- 6
- 30
- 35

- 19
- 1
-
2this line `String toPDF = String.valueOf(Character.toChars(e.getCode(e.getNameFromCharacter(symbol))));` seemed to throw ioexception, complaining No character code for character name 'euro' – Blake Apr 21 '15 at 08:20
A created a solution of the many:
String text = "Lorem ipsum dolor sit amet, consectetur adipisici € 1.234,56 " +
"elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua.";
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount(10, 500);
char[] tc = text.toCharArray();
StringBuilder te = new StringBuilder();
Encoding e =
EncodingManager.INSTANCE.getEncoding(COSName.WIN_ANSI_ENCODING);
for (int i = 0; i < tc.length; i++) {
Character c = tc[i];
int code = 0;
if(Character.isWhitespace(c)){
code = e.getCode("space");
}else{
code = e.getCode(e.getNameFromCharacter(c));
}
te.appendCodePoint(code);
}
contentStream.drawString( te.toString() );
contentStream.endText();
contentStream.close();
For the character space it's unknown code beacause the name "spacehackarabic" not described into the WinAnsiEncoding, I do not know why returns this name. Anyway I have verifier the character spaces, but it's possible also mapping this name with equivalent code space:
e.addCharacterEncoding( 040, "spacehackarabic" );
Thanks...

- 155
- 2
- 2
- 12
Maybe is too late, but I did it using:
String toPDF = String.valueOf(Character.toChars(e.getCode("Euro")));
Make sure you put uppercase "E", if you do "euro" throws an error. Please take a look of this link it help me a lot http://partners.adobe.com/public/developer/en/opentype/glyphlist.txt

- 1,462
- 4
- 19
- 29