1

I have overlooked all questions related to this issue on SO, but cant find and answer.

I have a textFile which contains unicode chars like "ā", "š", "ī" and others. The problem is that, when i write textFile to PDF, pdf file do not display it correctly.

How to set up my code, so i could write these chars on my PDF? Maybe even better question is: Is that even possible? Since i have been looking for this for few hours and can't find a solution.

Since this app will be commercial, i cant use iText!

My Code:

TextToPDF pdf = new TextToPDF();
String fileName = "test.txt";
File pdfFile = new File("test.pdf");

BufferedReader reader = new BufferedReader(new FileReader(fileName));

PDSimpleFont courier = PDType1Font.COURIER;
PDSimpleFont testFont = PDTrueTypeFont.loadTTF( document, new File("times.ttf" ));

pdf.setFont(testFont);
pdf.setFontSize(8);

pdf.createPDFFromText(document, reader);

document.save(pdfFile);
document.close();

If someone has done this, pls share how u manage to do taht. I believe it should be related with font.setFontEncoding(); But since PDFBox documentation is lacking quite a lot of information, i havent figured it out, what or how i should do this.

By the way here is the list of SO questions i have read, so pls dont redirect me back to them...

1) Java PDFBOX text encoding

2) Using Java PDFBox library to write Russian PDF

3) Using PDFBox to write UTF-8 encoded strings to a PDF

There was more topic i read, but these was still opened in my tab.

EDITED: Just found this -> Using PDFBox to write unicode strings to a PDF

Seems it's not possbile, need to update to version 2.0.0 and give it a try.

EDITED #2: In new version of PDFBox 2.0.0 (atleast now) has been removed the class TextToPDF() which let me pass in textFile. So now it means, that either i manually read the text and then write it to PDF, or need to find some other solutions.

Community
  • 1
  • 1
arccuks
  • 173
  • 2
  • 12

3 Answers3

0

Your Problem is here:

BufferedReader reader = new BufferedReader(new FileReader(fileName));

As described here: http://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html The FileReader will read the file in System default encoding. Change it to this:

BufferedReader in = new BufferedReader(
           new InputStreamReader(
                      new FileInputStream(fileDir), "UTF8"));

This would read your file in UTF-8 if it is in UTF-8. Special chars as you described exist in alout of character encoding like iso latin 1 etc.

When you know the encoding of your input make sure to read it in this encoding. Then PDFBox can write them in his desired encoding, too.

Rene M.
  • 2,660
  • 15
  • 24
  • You know that your file is UTF-8 ? Try to print the content of the file to standard out to see if java have read the content correctly. If so then your pdfbox needs some more setup. Otherwise you still use wrong encoding to read the file. Attention the encoding of a txt file could not 100% inspected. You have to know it or you have to try. – Rene M. Jul 13 '15 at 13:35
  • I am using NotePad++ and i have set textFile encoding to be `UTF-8`. Ty for trying to help, but seems that need to try out pre-release version 2.0.0 – arccuks Jul 13 '15 at 13:40
  • In notepad++ you can switch encoding to see file in this encoding and you can convert to encoding! Use set encoding to look in which encoding you see the correct content. Then use same encoding in your programm to read that file. – Rene M. Jul 13 '15 at 13:42
0

Just found this -> Using PDFBox to write unicode strings to a PDF

Seems it's not possbile, need to update to version 2.0.0 and give it a try.

EDITED #2: In new version of PDFBox 2.0.0 (atleast now) has been removed the class TextToPDF() (In comment, has been said that its avaiable now) which let me pass in textFile. So now it means, that either i manually read the text and then write it to PDF, or need to find some other solutions

Community
  • 1
  • 1
arccuks
  • 173
  • 2
  • 12
  • *"removed the class TextToPDF"* - not entirely, it merely has been moved to a separate JAR, pdfbox-tools.jar, which also is available via maven or as download from pdfbox.apache.org: https://pdfbox.apache.org/download.cgi#20x – mkl Feb 01 '17 at 17:18
  • When i did this, there either was lack of information about this or there wasn't made that JAR. – arccuks Feb 03 '17 at 11:32
-2
you can create a pdf by simply creating a file with .pdf extension 
You are going to create pdf file like this way  "**File pdfFile = new File("test.pdf")**"  but itsn't correct way . please go through below code how to crate pdf file .

    public static void main(String arg[]){
       this.create("test.pdf");`enter code here`enter code here`
    }
    public void create(String file) throws IOException {*enter code here*
      PDDocument document=null;
      try {
        document=new PDDocument();
        PDPage blankPage=new PDPage();
        document.addPage(blankPage);
        document.save(file);
      }
      finally {
        if (document != null) {
          document.close();
        }
      }
    }

and also go through below link **http://www.javased.com/api=org.apache.pdfbox.pdmodel.PDDocument**
  • 1st: I don't see how this is ever going to resolve my unicode problem. 2nd: Link points to PAGE_NOT_FOUND. 3rd: For the last one, i think that its more better to do try-with-resources method like `try(PDDocument doc = new PDDocument())` – arccuks Jul 13 '15 at 14:26