4

I have an application I'm writing which reads a docx file. It appears that I may need to read the formatting of the text, and not just the content. I have googled the matter but finding a search term that finds me what I'm looking for, most of it points me to using formatted text inputs and the like.

Does anyone know what class I should be using?

skaffman
  • 398,947
  • 96
  • 818
  • 769
Wing
  • 299
  • 1
  • 8
  • Is this what you're looking for (http://msdn.microsoft.com/en-us/library/dd773189%28v=office.12%29.aspx)? – jefflunt Jan 05 '12 at 15:11
  • http://msdn.microsoft.com/en-us/library/aa338205%28v=office.12%29.aspx would seem to imply that a combination of XML and ZIP stuff should be able to represent the contents of a .DOCX file. If that's the case, just use XML-like structures, which will store both the content and the formatting. There are plenty of XML libraries available in Java. I haven't looked too closely at the .DOCX format, but if it's basically XML, then you should be able to read the document straight-up with one of the libraries. – jefflunt Jan 05 '12 at 15:12
  • There's also this http://stackoverflow.com/questions/7731948/java-library-for-reading-word-documents and this http://stackoverflow.com/questions/6608071/searching-docx-files-in-java which provide resources that appear to include actual libraries that read .DOCX format. – jefflunt Jan 05 '12 at 15:15
  • I have no problem reading the file. It's just a zip file containing xml. What I need is a library to represent formatted text (italic, bold, underline only) in memory and not as part of an InputPane of some sort. – Wing Jan 05 '12 at 15:22

1 Answers1

0

Apache poi should give you (at least some) access to Excel styles - such as colors, fonts etc. I'm not sure about exotic cases, but it's certainly possible to obtain cell font color for example. The following code works for me:

XSSFCell cell = ...
if (IndexedColors.WHITE.getIndex() == cell.getCellStyle().getFont().getColor()) {
  ...
}
maximdim
  • 8,041
  • 3
  • 33
  • 48