0

I have hit somewhat of a roadblock. My goal is to filter out everything except the number.

Here is the xml file

<?xml version="1.0" encoding="utf-8" ?>
<orders>
  <order>
     <stuff>"Some random information and # 123456"</stuff>
  </order>
</orders>

Here is my incomplete code. I don't know how to find it nor how to go about making the change I want.

public static void main(String argv[]) {
        try {
            // Lesen der Datei
            File inputFile = new File("C:\\filepath...\\asdf.xml");
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
            Document doc = docBuilder.parse(inputFile);

            // I don't know where to go from there
            NodeList filter = doc.getChildNodes();
                    
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            DOMSource source = new DOMSource(doc);
            StreamResult consoleResult = new StreamResult(System.out);
            transformer.transform(source, consoleResult);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
Squary94
  • 248
  • 2
  • 16
  • Look into Xpath, https://stackoverflow.com/questions/2811001/how-to-read-xml-using-xpath-in-java You can get the value of stuff by doing something similar to /orders/order/stuff – JCompetence Aug 16 '21 at 09:34
  • As you use XSLT `Transformer`, why not write an XSLT stylesheet that does the job. It is not sure whether you want a new XML document with the previous structure but `"Some random information and # 123456"` transformed to `123456` or solely a number result but XSLT can do both. – Martin Honnen Aug 16 '21 at 09:40
  • How would I go about doing that with XSLT then? I am somewhat new to all of this. – Squary94 Aug 16 '21 at 09:44

1 Answers1

0

When you use

Transformer transformer = transformerFactory.newTransformer();

the transformer is an "identity transformer" - it copies the input to the output with no change. In effect you're using the identity transformer here for serialization only, to convert the DOM to lexical XML.

If you want to make actual changes to the XML content, you have two choices: either write Java code to modify the in-memory DOM tree before serialising it, or write XSLT code so your Transformer is doing a real transformation not just an identity transformation. XSLT is almost certainly the better approach except that it involves more of a learning curve.

I'm not sure exactly what output you want, which makes it difficult to give you working code. The phrase "filter out" is unfortunately ambiguous, when people say "I want to filter out X" they sometimes mean they want to remove X, and sometimes they mean they want to remove everything except X. Also, "removing the number" isn't a complete specification unless we know all possibilities of what might appear in your document, for example is the number always preceded by "#", or is that only the case in this one example input? But one approach would be to remove all digits, which you could do with a call on translate(., '0123456789', '').

Note that if you're using XSLT you don't need to construct a DOM first, in fact, it's a waste of time and space. Just supply the lexical XML as input to the transformer, in the form of a StreamSource.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • I want just the number after the # so 123456 I have 0 experience with XSLT, what could a possible solution look like? – Squary94 Aug 16 '21 at 16:47
  • My initial idea was to take the contents of that node, put it in a string and then filter the string to my specifications and replace the node with the altered string. But I don't really understand how since the keywords are so confusing to me. – Squary94 Aug 16 '21 at 16:54