is there any way other than using Xpath for this?

Question

hello guys i'am writing this program:

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class DOMbooks {
   public static void main(String[] args) throws Exception {
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder docBuilder = factory.newDocumentBuilder();
      File file = new File("books-fixed.xml");
      Document doc = docBuilder.parse(file);
      NodeList list = doc.getElementsByTagName("*");
      int bookCounter = 1;
      for (int i = 1; i < list.getLength(); i++) {
         Element element = (Element)list.item(i);
         String nodeName = element.getNodeName();
         if (nodeName.equals("book")) {
            bookCounter++;
            System.out.println("BOOK " + bookCounter);
            String isbn = element.getAttribute("sequence");
            System.out.println("\tsequence:\t" + isbn);
         } 
         else if (nodeName.equals("author")) {
            System.out.println("\tAuthor:\t" + element.getChildNodes().item(0).getNodeValue());
         }
         else if (nodeName.equals("title")) {
            System.out.println("\tTitle:\t" + element.getChildNodes().item(0).getNodeValue());
         } 
         else if (nodeName.equals("publishYear")) {
            System.out.println("\tpublishYear:\t" + element.getChildNodes().item(0).getNodeValue());
         } 
         else if (nodeName.equals("genre")) {
            System.out.println("\tgenre:\t" + element.getChildNodes().item(0).getNodeValue());
         } 
      }
   }
}

i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck, with too much errors... any suggestions? assuming that i have this table and i only want to select science fiction books with all their info

 <book sequence="5">
  <title>Aftershock</title> 
  <auther>Robert B. Reich</auther> 
  <publishYear>2010</publishYear> 
  <genre>Economics</genre> 
  </book>
- <book sequence="6">
  <title>The Time Machine</title> 
  <auther>H.G. Wells</auther> 
  <publishYear>1895</publishYear> 
  <genre>Science Fiction</genre>

assuming i have this table i only want to print the Science Fiction books with all their info...

Why is XPath stuck with too much errors? It is the de-facto tool for querying XML for over 15 years now and has been very stable. What processor do you use that you encounter errors (assuming you mean bugs in the processor)? — Abel, Sep 11 '15 at 21:57
yes, i deleted the part where i wrote the Xpath block, actually it was rubbish.. i imported many unnecessary packages and wrote many unnecessary code lines, i am totally new to it.. i trying for 4 hours already but nothing seems to work — Ruby, Sep 11 '15 at 22:02
Let's go back a step. Why don't you show us a (small,but relevant) part of the input XML, likewise for the expected output XML. Maybe XPath is not good and you need XSLT. Many people use Java with XML technologies without a problem, but using a much harder technology like you are doing now is making it even worse... (imo) — Abel, Sep 11 '15 at 22:04

score 2 · Accepted Answer · edited May 23 '17 at 11:51

i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck,

I assume you'd mean that you want all the books for which genre == "Science Fiction", right? In that case, XPath is really much simpler than whatever you were trying in Java (you don't show the root note, so I'll start with '//', which selects at any depth):

//book[genre = 'Science Fiction']

XSLT approach to simplify things

Now, having another look at your code, it looks like you want to print each and every element, including the element's name. This is more trivially done in XSLT:

<!-- every XSLT 1.0 must start like this -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!-- you want text -->
    <xsl:output method="text" />

    <!-- match any science fiction book (your primary goal) -->
    <xsl:template match="book[genre = 'Science Fiction']">

        <xsl:text>BOOK </xsl:text>
        <xsl:value-of select="position()" />

        <!-- send the children and attribute to be processed by templates -->
        <xsl:apply-templates select="@sequence | *" />
    </xsl:template>

    <!-- "catch" any elements or attributes under <book> -->
    <xsl:template match="book/* | book/@*">

        <!-- a newline and a tab per line-->
        <xsl:text>&#xA;&#9;</xsl:text>

        <!-- and the name of the element or attribute -->
        <xsl:value-of select="local-name()" />

        <!-- another tab, plus contents of the element or attribute -->
        <xsl:text>&#9;</xsl:text>
        <xsl:value-of select="." />
    </xsl:template>

    <!-- make sure that other values are ignored, but process children -->
    <xsl:template match="node()">
        <xsl:apply-templates />
    </xsl:template>

</xsl:stylesheet>

You can use this code, which is significantly shorter (if you ignore the comments and whitespace) and (arguably, once you get the hang of it) more readable than your original code. To use it:

Store it as books.xsl

Then, simply use this (copied and changed from here):

import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

public class TestMain {
    public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
        TransformerFactory factory = TransformerFactory.newInstance();
        Source xslt = new StreamSource(new File("books.xsl"));
        Transformer transformer = factory.newTransformer(xslt);

        Source text = new StreamSource(new File("books-fixed.xml"));
        transformer.transform(text, new StreamResult(new File("output.txt")));
    }
}

XPath 2.0

If you can use Saxon in Java, the above becomes a one-liner with XPath 2.0 and you don't even need XSLT:

for $book in //book[genre = 'Science Fiction']
return (
    'BOOK', 
    count(//book[genre = 'Science Fiction'][. << $book]) + 1,
    for $tag in $book/(@sequence | *)
    return $tag/local-name(), ':', string($tag)
)

is there any way other than using Xpath for this?

1 Answers1

XSLT approach to simplify things

XPath 2.0