0

I am working on a xml processing project. The goal of the project is to take a list of keyterms and descriptions from a document written like so:

<?xml version="1.0" encoding="utf-8"?>
<wordmatch>
    <exercise>
        <key>
        <keyterm>Loop Body</keyterm>
        <decription>is the part of the body that contains the statements to be repeated</decription>
        </key>
        <key>
        <keyterm>Iteration</keyterm>
        <decription>is one time execution of the loop body.</decription>
        </key>
        <key>
        <keyterm>Loop Condition</keyterm>
        <decription>Description</decription>
        </key>
        <key>
        <keyterm>Infinite Loop</keyterm>
        <decription>Description</decription>
        </key>
        <key>
        <keyterm>Sentinel Value</keyterm>
        <decription>Description</decription>
        </key>
        <key>
        <keyterm>Off-by-one</keyterm>
        <decription>is an error in the program that causes the loop body to be executed one more or less time.</decription>
        </key>
        <key>
        <keyterm>Input Redirection</keyterm>
        <decription>is to redirect the output to a data file rather to the console. The file is specified at the command line after the symbol >.</decription>
        </key>
        <!-- <key>
        <keyterm>Output Redirection</keyterm>
        <decription>is to redirect the input from a data file rather from the keyboard. The file is specified at the command line after the symbol <.</decription>
        </key> -->
    </exercise>
    <filename>Section5_2.html</filename>
</wordmatch>

Then process the keyterms and descriptions into respective arrays. I've written the following method for parsing the xml document

private static InputStream readXmlFileIntoInputStream(final String fileName) {
        System.out.println("processing:" + fileName);
        return Main.class.getClassLoader().getResourceAsStream(fileName);
    }
    
    public static Document readXML(String fileName) {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        // This is used to prevent xml injection attacks        
        try(InputStream inputStream = new FileInputStream(fileName)){
            dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
            DocumentBuilder db = dbf.newDocumentBuilder();
            doc = db.parse(inputStream);
            doc.getDocumentElement().normalize();
        }
        catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }
            return doc;
    }

The method for processing the xml document takes the doc as an argument

public static void processKeyTermsAndDescriptions(Document doc) {
        System.out.println(doc.getDocumentElement().getNodeName());
        
        //TODO: add keyterms to array.
        
        
        //TODO: add descriptions to array.
        
    }

A minimal reproducible example is the following:

private static final String FILENAME = "./src/resources/exercise.xml";
    private static Document doc;
    private static ArrayList<String> keyterms;
    private static ArrayList<String> descriptions;
    
    public static void main(String args[]) {
        System.out.println("Step 1 read xml document");
        Document doc = readXML(FILENAME);
        
        System.out.println("Step 2 for each key add keyterm and description to respective arrays");
        processKeyTermsAndDescriptions(doc);
        
        System.out.println("Step 3 print header");
        String header = printHeader();
        
        System.out.println("Step 4 print body w/ keyterms and descriptions");
        String body = printBodyWithKeyTermsAndDescriptions(keyterms,descriptions);
        
        System.out.println("Step 5 print footer");
        String footer = printFooter(keyterms,descriptions);
        
        System.out.println(header);
        System.out.println(body);
        System.out.println(footer);
        
        // Instantiate the Factory
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    }

I would like to figure out how to loop over the elements in the xml document to process keyterms and descriptions into respective arrays.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Evan Gertis
  • 1,796
  • 2
  • 25
  • 59
  • If you only want to parse the XML document I sugest Sax parser. It's an event based stream parser and fires for every XML element and XML attribute. With Sax parser it's not only easier to parse the file, it takes also less resources than DOM parser which creates the entire DOM tree in memory. – Stefan D. Jan 24 '22 at 09:18
  • Also if you want to use DOM parser [this answer](https://stackoverflow.com/a/5511298/5564903) may help you. – Stefan D. Jan 24 '22 at 09:29

0 Answers0