0

I'm trying to parse document.xml inside DOCX archive, but stuck because can't retrieve NodeList with XPath.

File docxFile = new File ("input.docx");
URI docxUri = URI.create("jar:" + docxFile.toURI());
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);
Path documentXmlPath = zipFS.getPath("/word/document.xml");

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
NodeList paragraphs = doc.getElementsByTagName("w:p");
System.out.println(paragraphs.getLength()); // gives real number of nodes

XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "//w:p";
NodeList nodes = (NodeList) xpath.compile(expression).evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength()); // gives 0;

DOM getElementsByTagName() method work fine. But not XPath. What I'm doing wrong?

Vitaliy
  • 489
  • 6
  • 20

0 Answers0