0
<?xml version="1.0" encoding="UTF-16"?>
<ABC>
    <END />
    <Tables>
        <START>
            <row>
                <id>111</id>
                <name>abc</name>
                <deptId>1</deptId>
            </row>
            <row>
                <id>112</id>
                <name>abc1</name>
                <deptId>1</deptId>
            </row>
            <row>
                <id>113</id>
                <name>abc3</name>
                <deptId>1</deptId>
            </row>
            <row>
                <id>222</id>
                <name>def</name>
                <deptId>2</deptId>
            </row>
            <row>
                <id>333</id>
                <name>pqr</name>
                <deptId>2</deptId>
            </row>
            <row>
                <id>444</id>
                <name>xyz</name>
                <deptId>2</deptId>
            </row>
            <row>
                <id>555</id>
                <name>lmn</name>
                <deptId>3</deptId>
            </row>
            <row>
                <id>555</id>
                <name>lmn</name>
                <deptId>3</deptId>
            </row>
        </START>
    </Tables>
</ABC>

I have a xml with the above structure. I have to spilt the xml into 3 xmls based on the different deptId. I have to Split the xml into smaller one based on change in tag values. My elemement is deptId whose values is been changes after some rows. The all elements with same deptId are in a sequence.

The required output is : Its good to have the xml name as the department id.

The first xml be with name 1.xml :

<?xml version="1.0" encoding="UTF-16"?>
<ABC>
    <END />
    <Tables>
        <START>
            <row>
                <id>111</id>
                <name>abc</name>
                <deptId>1</deptId>
            </row>
            <row>
                <id>112</id>
                <name>abc1</name>
                <deptId>1</deptId>
            </row>
            <row>
                <id>113</id>
                <name>abc3</name>
                <deptId>1</deptId>
            </row>
        </START>
    </Tables>
</ABC>

The second xml with name 2.xml :

<?xml version="1.0" encoding="UTF-16"?>
<ABC>
    <END />
    <Tables>
        <START>
            <row>
                <id>222</id>
                <name>def</name>
                <deptId>2</deptId>
            </row>
            <row>
                <id>333</id>
                <name>pqr</name>
                <deptId>2</deptId>
            </row>
            <row>
                <id>444</id>
                <name>xyz</name>
                <deptId>2</deptId>
            </row>
        </START>
    </Tables>
</ABC>

The third xml with name 3.xml :

<?xml version="1.0" encoding="UTF-16"?>
<ABC>
    <END />
    <Tables>
        <START>
            <row>
                <id>113</id>
                <name>abc3</name>
                <deptId>1</deptId>
            </row>
        </START>
    </Tables>
</ABC>

I had tried with the StAXSource option by referring couple of option The option I have tried are by referring below links

Split xml Split large xml

Here is the sample piece of code that have been tried.

import java.io.File;
import java.io.FileReader;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo2 {

public static void main(String[] args) throws Exception {
    XMLInputFactory xif = XMLInputFactory.newInstance();
    XMLStreamReader streamReader = xif.createXMLStreamReader(new FileReader("D://SmallXmltoSplit.xml"));

    streamReader.nextTag(); // Advance to next element
    streamReader.nextTag();
    streamReader.nextTag();
    streamReader.nextTag();
    streamReader.nextTag();
    streamReader.nextTag();

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer t = tf.newTransformer();
    String deptId = null;
    File file = new File("D://test" + ".xml");
    while (streamReader.hasNext()) {
        if (streamReader.isStartElement()) {
            if (streamReader.getLocalName().equals("deptId")) {
                if (deptId == null) {
                    deptId = streamReader.getElementText();
                    file = new File("D://" + deptId + ".xml");
                    t.transform(new StAXSource(streamReader), new StreamResult(file));
                } else if (deptId != streamReader.getElementText()) {
                    file = new File("D://" + deptId + ".xml");
                    t.transform(new StAXSource(streamReader), new StreamResult(file));
                } 
            }
            t.transform(new StAXSource(streamReader), new StreamResult(file));
        }
        streamReader.next();
    }
}

}

Abhijit Bashetti
  • 8,518
  • 7
  • 35
  • 47

2 Answers2

1

The XML reading should go by <row>, more or less as follows:

    XMLInputFactory xif = XMLInputFactory.newInstance();
    // Do not use a Reader, especially not a FileReader. An InputStream leaves the
    // encoding of the XML to the XMLStreamReader.
    InputStream in = Files.newInputStream(Paths.get("D:/SmallXmltoSplit.xml"));
    XMLStreamReader streamReader = xif.createXMLStreamReader(in);
    streamReader.nextTag();

    String id = "";
    String name = "";
    String deptId = "";

    String oldDeptId = null;

// File file = new File("D:/test" + ".xml");

    while (streamReader.hasNext()) {
        if (streamReader.isStartElement()) {
            switch (streamReader.getLocalName()) {
            case "row":
                id = "";
                name = "";
                deptId = "";
                break;
            case "id":
                id = streamReader.getElementText();
                break;
            case "name":
                name = streamReader.getElementText();
                break;
            case "deptId":
                deptId = streamReader.getElementText();
                break;
            }
        }
        if (streamReader.isEndElement()) {
            switch (streamReader.getLocalName()) {
            case "START":
                if (oldDeptId != null) {
                    saveDept();
                    //oldDeptId = deptId;
                }
                break;
            case "row":
                if (!deptId.equals(oldDeptId)) {
                    if (oldDeptId != null) {
                        saveDept();
                        oldDeptId = deptId;
                    }
                    startDept(deptId);
                }
                appendDeptRow(id, name, deptId);
                break;
            }
        }
    }

The writing can be done without transformation; in fact it could be done as text.

I leave that as an excercise.

FileReader and FileWriter should not be used, as they encode the bytes using the default platform encoding. The class Files has many nice file functions.

Another specialty here is the UTF-16 encoding which doubles the size of an almost ASCII file. As you mention having a large file, it would be best to keep that file in UTF-8, probably even if the names are in Farsi, Greek, Japanese or Bulgarian.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

It's much easier to do this with XSLT 2.0:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform version="2.0">
  <xsl:template match="/">
    <xsl:for-each-group select="//row" group-adjacent="deptId">
      <xsl:result-document href="{current-grouping-key()}.xml">
        <ABC>
         <END />
          <Tables>
           <START>
            <xsl:copy-of select="current-group()"/>
           </START>
          </Tables>
        </ABC>
      </xsl:result-document>
    </xsl:for-each-group>
  </xsl:template>
</xsl:transform>

To run this from a Java application, you will want to download Saxon, and then invoke it for example with this logic:

    Processor proc = new Processor(false);
    XsltCompiler comp = proc.newXsltCompiler();
    XsltExecutable exp = comp.compile(new StreamSource(new File("my-stylesheet.xsl")));
    Serializer out = proc.newSerializer(new File("output.xml"));
    Xslt30Transformer trans = exp.load30();
    trans.applyTemplates(new StreamSource(new File("input.xml"), out);

More details here: http://www.saxonica.com/documentation/index.html#!using-xsl/embedding/s9api-transformation

Michael Kay
  • 156,231
  • 11
  • 92
  • 164