0

I am trying to fetch some data from a PDF file in Java using apache PDFBox(1.8.9). I have added the jar in my buildpath and classpath (in Eclipse-Mars)

I am getting a null pointer exception while creating a PDFTextStripper object.

import java.io.File;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.pdmodel.PDDocument;

public class MainClass {

    public static void main(String[] args) {
        PDDocument pd ;

        try{

          StringBuilder sb = new StringBuilder();       

          File input = new File("C:\\Result.pdf");
          pd = PDDocument.load(input);

          PDFTextStripper s = new PDFTextStripper();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
    }

}

The error I am getting is :

java.lang.NullPointerException
at org.apache.pdfbox.util.TextNormalize.findICU4J(TextNormalize.java:54)
at org.apache.pdfbox.util.TextNormalize.<init>(TextNormalize.java:45)
at org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:229)
at MainClass.main(MainClass.java:17)

(Line 17 is where I am trying to create a PDFTextStripper object)

fabian
  • 80,457
  • 12
  • 86
  • 114
Pranav95
  • 1
  • 1
  • http://stackoverflow.com/questions/24009467/pdfbox-printing-null-pointer-exception-while-printing-using-pdfbox may be same issue – seenukarthi Aug 19 '15 at 11:20
  • If you use version 1.8.9 the only way this occurs is, if `Class.getClassLoader()` returns `null`. That could happen, if the class is loaded by the bootstrap classloader. You could try to use a different classloader, see http://stackoverflow.com/a/2832330/2991525 – fabian Aug 19 '15 at 11:54
  • Thank you. That helped. Removed the jars from classpath and used extensions class loader instead of bootstrap class loader. – Pranav95 Aug 21 '15 at 15:32

2 Answers2

0

You are missing some dependency, please ensure below three jars are present in your classpath:-

enter image description here

I executed the code mentioned in your question with the above three jars, didn't receive any NPE.

Also kindly check your pdfbox-1.8.9.jar, ensure that its not corrupted.
TextStripper class is present in pdfbox-1.8.9.jar, so It looks to me that this jar is corrupted.
Download the jar again and try.

Amit Bhati
  • 5,569
  • 1
  • 24
  • 45
  • Why do you think the `PDFTextStripper` class is missing? That would have caused a exception before `` is pushed to the stack. – fabian Aug 19 '15 at 11:45
0

Checking the source of TextStripper class, it appears that a class not found exception is made to return as null.

You need ICU4J jar as your dependency. These classes is loaded at run time.

From TextStripper

 // see if we can load the icu4j classes from the classpath
        try 
        {
            this.getClass().getClassLoader().loadClass("com.ibm.icu.text.Bidi");
            this.getClass().getClassLoader().loadClass("com.ibm.icu.text.Normalizer");
            icu4j = new ICU4JImpl();
        } 
        catch (ClassNotFoundException e) 
        {
            icu4j = null;
        }
jozzy
  • 2,863
  • 3
  • 15
  • 12
  • Correct code snippet, but incorrect conclusion: Since the NPE occurs in line `this.getClass().getClassLoader().loadClass("com.ibm.icu.text.Bidi");` the only way a NPE is thrown is, if `this.getClass().getClassLoader()` returns `null`, which may occur, if the class is loaded by the bootstrap classloader (see http://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#getClassLoader--) – fabian Aug 19 '15 at 11:35
  • @Jozzy ICU4J jar is not a dependency of pdfbox. – Amit Bhati Aug 19 '15 at 11:35