My Requirement is : I have 1GB xml file and want to remove few nodes from xml file.Here removing xml nodes can be anything in entire file which is based on the input.What is the best parser in JAVA. I'm Currently using DOM parser and it is working fine for 100MB files but it is throwing out of memory error :heap space for 1 GB file. Can anyone suggest best approach for my code below:
public static void main(String[] args) {
DocumentBuilder docBuilder = null;
File inputFile = new File("/scratch/bigfile/final.txt");
// Parse the xml file using DOM parser
try{
DocumentBuilderFactory docBuilderFactory =DocumentBuilderFactory.newInstance();
docBuilderFactory.setExpandEntityReferences(false);
docBuilderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc= docBuilder.parse(inputFile);
// Remove unwanted nodes from xml file
Element element1 = (Element) doc.getElementsByTagName("G_SUMMARY_ROWSET").item(0);
element1.getParentNode().removeChild(element1);
Element element2 = (Element) doc.getElementsByTagName("G_JRNLSOURCE_ROWSET").item(0);
element2.getParentNode().removeChild(element2);
Element element3 = (Element) doc.getElementsByTagName("G_JRNLSOURCE_UNMATCHED_ROWSET").item(0);
element3.getParentNode().removeChild(element3);
Element element4 = (Element) doc.getElementsByTagName("G_JRNLDETAILS_UNMATCHED_ROWSET").item(0);
element4.getParentNode().removeChild(element4);
// Convbert Dom Document to Byte array
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream bos=new ByteArrayOutputStream();
StreamResult result=new StreamResult(bos);
transformer.transform(source, result);
byte []array=bos.toByteArray();
System.out.println(array.length);
}
catch (Exception e) {
e.printStackTrace();
}
}