0

I am writing an application for searching the Content of Documents i have already written the code for searching the documents which are editable by notepad.

I also wish to do the same for docx files. After some research i have come up with these two things

  1. http://www.infoq.com/articles/cracking-office-2007-with-java this method requires me to extract docx file and then search the xml files however this would involve an extra overhead on the extraction part and frankly i dont know how to process an xml file ( discarding attribute content etc)

  2. http://www.javadocx.com/download this method allows me to import a jar library to my project and supposedly i can create docx files with it, what i dont understand is how to open docx files using it

can anyone recommend me a alternate method to perform the same action or help with the above two mentioned methods?

Nishant
  • 54,584
  • 13
  • 112
  • 127

1 Answers1

1

Try http://tika.apache.org/ or docx4j or POI.

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84