5

Trying to save Arabic words in an editable PDF. It works all fine with English ones but when I use Arabic words, I am getting this exception:

java.lang.IllegalArgumentException: U+0627 is not available in this font Helvetica encoding: WinAnsiEncoding

Here is how I generated PDF:

public static void main(String[] args) throws IOException
{
  String formTemplate = "myFormPdf.pdf";
  try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
  {
    PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
    if (acroForm != null)
    {
        PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
        field.setValue("جملة");
    }
    pdfDocument.save("updatedPdf.pdf"); 
  }
}
Danyal Sandeelo
  • 12,196
  • 10
  • 47
  • 78

2 Answers2

3

That's how I made it work, I hope it would help others. Just use the font that is supported by the language that you want to use in the PDF.

public static void main(String[] args) throws IOException
{
  String formTemplate = "myFormPdf.pdf";

  try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
  {
    PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
    // you can read ttf from resources as well, this is just for testing 
    PDFont font = PDType0Font.load(pdfDocument,new File("/path/to/font.ttf"));
    String fontName = acroForm.getDefaultResources().add(pdfont).getName();
    if (acroForm != null)
    {
        PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
        field.setDefaultAppearance("/"+fontName +" 0 Tf 0 g");
        field.setValue("جملة");
    }

    pdfDocument.save("updatedPdf.pdf"); 
  }
}

Edited: Adding the comment of mkl The font name and the font size are parameters of the Tf instruction, and the gray value 0 for black is the parameter for the g instruction. Parameters and instruction names must be appropriately separated.

Danyal Sandeelo
  • 12,196
  • 10
  • 47
  • 78
  • First of all, there is a small error: There must be a space between `0` and `g` in the default appearance string. Furthermore, your solution can be improved: You completely ignore a possible previous default appearance value. Obviously you want to replace the font name, but if the field is planned to have a fixed font size (instead of the `0` size before **Tf** asking for automatic sizing) or a non-black color (you enforce via `0 g`), you might want to keep use the original settings instead of replacing them. – mkl Apr 04 '19 at 09:48
  • 1
    Also you load the font so that it will be subset'ed. If the field value won't be changed after your filling it in, that's ok. But if some other PDF processor afterwards shall probably change it, this can mean trouble. But you surely don't want to add the whole of e.g. ARIALUNI either. Thus, you probably should fully embed a font with a character set limited to the expected inputs for that field. Or subset a large font but make sure that the character required for the expected inputs is contained in the subset. – mkl Apr 04 '19 at 09:54
  • 0g seems to work fine. I am using `11 TF 0g` and looks all okay on the pdf. They would be generated and printed. This is just a sample, my actual code is a lot different but yes that's how I have made it work – Danyal Sandeelo Apr 04 '19 at 11:02
  • *"0g seems to work fine"* - Only because your viewer ignores some kinds of errors and black fill color is default anyways. `0g` is rubbish. – mkl Apr 04 '19 at 11:59
  • so "11 Tf 0g" should be updated with " 11 Tf 0 g" ? – Danyal Sandeelo Apr 04 '19 at 12:33
  • 1
    Yes. The font name and the font size are parameters of the **Tf** instruction, and the gray value `0` for black is the parameter for the **g** instruction. Parameters and instruction names must be appropriately separated. – mkl Apr 04 '19 at 13:13
  • @mkl when I generated pdf with `11 Tf 0g`, I could not edit it anymore with acrobat reader once I changed backed the code to `11 Tf 0 g` it seems editable even after generation – Danyal Sandeelo Apr 07 '19 at 06:08
  • As I said, the space is required. – mkl Apr 07 '19 at 06:26
  • yes, that's needed. I have updated my answer with your comment as well – Danyal Sandeelo Apr 07 '19 at 08:02
  • @mkl there is an issue, Calibiri is working fine windows but it doesn't work fine on Linux. Calibiri is not a free font, I understand that and it comes with windows. The pdfbox api has halvatica, times and another font, they do not support Arabic so can you tell me any font that supports Arabic so that I may use that one instead of calibiri – Danyal Sandeelo Apr 12 '19 at 10:24
  • Do you use the same calibri font file in both contexts? Then you appear to have an issue with encodings, somewhere you seem to rely on some default encoding which might differ there. Or do you use jres at different version states in both contexts? – mkl Apr 12 '19 at 10:42
  • I will check the JRE versions but one is on websphere and another one is on my local machine springboot which uses tomcat. Font is same on both. Would I need to install fonts in any case? I am putting the font in my resources and reading it from there – Danyal Sandeelo Apr 14 '19 at 05:54
  • I am using liberationSans font now, since it works fine on Linux but it gives the same error that's like `No glyph for U+000A in font liberationSans ` – Danyal Sandeelo Apr 14 '19 at 08:42
  • *"No glyph for U+000A"* means you have a *line feed* character in your string. Don't try to draw control characters. – mkl Apr 14 '19 at 15:30
1

You need a font which supports those Arabic symbols.
Once you've got a compatible font, you can load it using PDType0Font

final PDFont font = PDType0Font.load(...);

A Type 0 font is a font which references multiple other fonts' formats, and can, potentially, load all available symbols.

See also the Cookbook - working with fonts (no examples with Type 0, but still useful).

LppEdd
  • 20,274
  • 11
  • 84
  • 139
  • I see. Where would I need to set the font? since I am not doing anything related to `font` in my code – Danyal Sandeelo Apr 01 '19 at 11:44
  • @DanyalSandeelo see the cookbook I linked. There is an example of setting a font. – LppEdd Apr 01 '19 at 11:46
  • @DanyalSandeelo if after you've tried you still receive the error, or somehow you can't figure out how to do it, reply here and post the updated code you have – LppEdd Apr 01 '19 at 12:02
  • it's working fine when I create a pdf but in case of editing a pdf, it doesn't work and gives the same exception. – Danyal Sandeelo Apr 01 '19 at 13:01
  • @DanyalSandeelo Can you post the code and the complete stacktrace? – LppEdd Apr 01 '19 at 13:05
  • This sample is working https://stackoverflow.com/questions/48284888/writing-arabic-with-pdfbox-with-correct-characters-presentation-form-without-bei/48346903#48346903 and the way I am setting values to acro fields, it doesn't work.. Code is on development machine, do not have internet there due to security – Danyal Sandeelo Apr 01 '19 at 13:06
  • 1
    btw "I'll stop contributing to StackOverflow once I get to 10k points." haha, I kind of have the same mission. :D – Danyal Sandeelo Apr 01 '19 at 13:08
  • @DanyalSandeelo mmmh can you post the exception that is thrown when you open the file? – LppEdd Apr 01 '19 at 13:18
  • exception is thrown at ` field.setValue("جملة");` and I have updated the exception in the question.. – Danyal Sandeelo Apr 01 '19 at 13:22
  • @DanyalSandeelo look here, seems the right approach for you https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/interactive/form/CreateSimpleFormWithEmbeddedFont.java – LppEdd Apr 01 '19 at 13:22
  • it's creating a pdf, creating acrofields, assigning the default resources to form, and assign font to field. I already have form and fields, I am now trying to override the font of existing pdf to see how it works. – Danyal Sandeelo Apr 01 '19 at 13:53
  • @DanyalSandeelo yeah I meant "look at it, because it is setting `setDefaultAppearance`" – LppEdd Apr 01 '19 at 13:53
  • 1
    @DanyalSandeelo told you `setDefaultAppearance` was the key ;) – LppEdd Apr 01 '19 at 15:05