0

When splitting a huge XML file I saw a very nice solution using Stax and Transformer.transform(). Nice BUT I see that some tags got lost. Why is that?

An XML file with Name... gives the following result. In the EVENT occasions the element tag is ommited.

Element: <?xml version="1.0" encoding="UTF-8"?><car><name>car1</name></car>
Element: <?xml version="1.0" encoding="UTF-8"?><name>car2</name>
Element: <?xml version="1.0" encoding="UTF-8"?><car><name>car3</name></car>
Element: <?xml version="1.0" encoding="UTF-8"?><name>car4</name>

How can I get the right elements? Has this to do with that transform( s, r) interferes with the input stream reading?

This is my code (which I saw in many places like this one). There is no change when using a StringReader or a FileReader.

I expected this: loop { advance to start-tag; get access to that element } What I see is: 1st: the element + 2nd: parts of the element + repeated.

String testCars = "<root><car><name>car1</name></car><car><name>car2</name></car><car><name>car3</name></car><car><name>car4</name></car></root>";
String element = "car";
try {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader streamReader = factory.createXMLStreamReader(new StringReader(testCars));
    streamReader.nextTag();
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer t = tf.newTransformer();
    while(streamReader.nextTag() == XMLStreamConstants.START_ELEMENT) {
            StringWriter writer = new StringWriter();
            StreamResult result = new StreamResult(writer);
            t.transform(new StAXSource(streamReader), result);
            System.out.println("Element: " + writer.toString());
    }
} catch (Exception e) { ... }
tm1701
  • 7,307
  • 17
  • 79
  • 168
  • What do you believe the position of `streamReader` is after the call to `transform(...)`, and why do you believe that, i.e. where did you see that behavior *documented*? – Andreas Apr 26 '19 at 18:21
  • 2
    Was that comment supposed to be an answer to my comment? I didn't ask what happens if you don't call `transform(...)`. I asked what *you* expected the position of the reader to be *after* the call, and I asked *why* you expected that, given that I cannot seem to find any documentation *specifying* what it will be, i.e. it appears to be *unspecified*. --- *"the code is shown as a solution in more places"* Like where, and how do you know it's valid. Just because you can find it on the web, doesn't mean it's true. – Andreas Apr 26 '19 at 18:28
  • Ok, I updated with a sample link. This sample is also referred by other places. What I expected? Is also now in the answer. – tm1701 Apr 26 '19 at 18:33
  • 1
    Did you read the [last comment](https://stackoverflow.com/questions/5169978/split-1gb-xml-file-using-java/5170415#comment70090016_5170415) on that link you provided? – Andreas Apr 26 '19 at 23:19
  • SUPERB! Can you add this as THE answer? – tm1701 Apr 27 '19 at 09:21

1 Answers1

0

Thanks to Andreas, this is the solution:

String testCars = "<root><car><name>car1</name></car><other><something>Unknown</something></other><car><name>car2</name></car></root>";
XMLInputFactory factory = XMLInputFactory.newInstance();
try {
    XMLStreamReader streamReader = factory.createXMLStreamReader(new StringReader(testCars));
    streamReader.nextTag();
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer t = tf.newTransformer();
    streamReader.nextTag();
    while ( streamReader.isStartElement() ||
          ( ! streamReader.hasNext() && streamReader.nextTag() == XMLStreamConstants.START_ELEMENT)) {
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        t.transform(new StAXSource(streamReader), result);
        System.out.println( "XmlElement: " + writer.toString());
    }
} catch (Exception e) { ... }

Input is:

<root>
  <car>
    <name>car1</name>
  </car>
  <other>
    <something>Unknown</something>
  </other>
  <car>
    <name>car2</name>
  </car>
</root>

Output is:

XmlElement: <?xml version="1.0" encoding="UTF-8"?><car><name>car1</name></car>
XmlElement: <?xml version="1.0" encoding="UTF-8"?><other><something>Unknown</something></other>
XmlElement: <?xml version="1.0" encoding="UTF-8"?><car><name>car2</name></car>
tm1701
  • 7,307
  • 17
  • 79
  • 168