I'm trying to parse document.xml
inside DOCX archive, but stuck because can't retrieve NodeList with XPath.
File docxFile = new File ("input.docx");
URI docxUri = URI.create("jar:" + docxFile.toURI());
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);
Path documentXmlPath = zipFS.getPath("/word/document.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
NodeList paragraphs = doc.getElementsByTagName("w:p");
System.out.println(paragraphs.getLength()); // gives real number of nodes
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "//w:p";
NodeList nodes = (NodeList) xpath.compile(expression).evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength()); // gives 0;
DOM getElementsByTagName()
method work fine. But not XPath. What I'm doing wrong?