4

Hai, i'm using Apache POI 3.6 I've already created some code..

XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
         wordxExtractor = new XWPFWordExtractor(doc);
         text = wordxExtractor.getText();

         System.out.println("adding docx " + file);
         d.add(new Field("content", text, Field.Store.NO, Field.Index.ANALYZED));

unfortunately, it generated error..

Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:149)
at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:136)
at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:98)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:53)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:98)
at org.apache.lucene.demo.Indexer.indexDocs(Indexer.java:153)
at org.apache.lucene.demo.Indexer.main(Indexer.java:88)

It seemed that it used Constructor

XWPFWordExtractor(OPCPackage container)

but not this one ->

XWPFWordExtractor(XWPFDocument document)

Any wondering why?? Or any idea how I can extract the .docx then convert it into a String?

Berkay Turancı
  • 3,373
  • 4
  • 32
  • 45
Doli
  • 41
  • 1
  • 1
  • 2

3 Answers3

4

You need to Add dom4j Library to your claspath or your project libraries

Deitek
  • 41
  • 2
2

It looks like you don't have all of the dependencies on your classpath.

If you look at http://poi.apache.org/overview.html you'll see that dom4j is a required library when working with the OOXML files. From the exception you got, it seems that you don't have it... If you look in the POI binary download, you should find it in the ooxml-libs subdirectory.

Gagravarr
  • 47,320
  • 10
  • 111
  • 156
0

You could try docx4j instead; see http://dev.plutext.org/svn/docx4j/trunk/docx4j/src/main/java/org/docx4j/TextUtils.java

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84