I am using XMLStreamReader to achieve my goal(splitting xml file). It looks good, but still does not give the desired result. My aim is to split every node "nextTag" from an input file:
<?xml version="1.0" encoding="UTF-8"?>
<firstTag>
<nextTag>1</nextTag>
<nextTag>2</nextTag>
</firstTag>
The outcome should look like this:
<?xml version="1.0" encoding="UTF-8"?><nextTag>1</nextTag>
<?xml version="1.0" encoding="UTF-8"?><nextTag>2</nextTag>
Referring to Split 1GB Xml file using Java I achieved my goal with this code:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo4 {
public static void main(String[] args) throws Exception {
InputStream inputStream = new FileInputStream("input.xml");
BufferedReader in = new BufferedReader(new InputStreamReader(inputStream));
XMLInputFactory factory = XMLInputFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
XMLStreamReader streamReader = factory.createXMLStreamReader(in);
while (streamReader.hasNext()) {
streamReader.next();
if (streamReader.getEventType() == XMLStreamReader.START_ELEMENT
&& "nextTag".equals(streamReader.getLocalName())) {
StringWriter writer = new StringWriter();
t.transform(new StAXSource(streamReader), new StreamResult(
writer));
String output = writer.toString();
System.out.println(output);
}
}
}
}
Actually very simple. But, my input file is in form from a single line:
<?xml version="1.0" encoding="UTF-8"?><firstTag><nextTag>1</nextTag><nextTag>2</nextTag></firstTag>
My Java code does not produce the desired output anymore, instead just this result:
<?xml version="1.0" encoding="UTF-8"?><nextTag>1</nextTag>
After spending hours, I am pretty sure to already find out the reason:
t.transform(new StAXSource(streamReader), new StreamResult(writer));
It is because, after the transform method being executed, the cursor will automatically moved forward to the next event. And in the code, I have this fraction:
while (streamReader.hasNext()) {
streamReader.next();
...
t.transform(new StAXSource(streamReader), new StreamResult(writer));
...
}
After the first transform, the streamReader gets directly 2 times next():
1. from the transform method
2. from the next method in the while loop
So, in case of this specific line XML, the cursor can never achive the second open tag . In opposite, if the input XML has a pretty print form, the second can be reached from the cursor because there is a space-event after the first closing tag
Unfortunately, I could not find anything how to do settings, so that the transformator does not automatically spring to next event after performing the transform method. This is so frustating.
Does anybody have any idea how I can deal with it? Also semantically is very welcome. Thank you so much.
Regards,
Ratna
PS. I can surely write a workaround for this problem(pretty print the xml document before transforming it, but this would mean that the input xml was being modified before, this is not allowed)