0

I am trying to parse an XML file using Java.

The XML file size is 256 kb only. I am using a DOM parser to parse the XML file. How can I parse the large XML file content?

Here's the method that parses the file content:

public Document parse_a_string(StringBuffer decodedFile) {
    Document doc1 = null;
    try {
        DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();
        DocumentBuilder db = factory.newDocumentBuilder();
        InputSource inStream = new InputSource();

         // problem here
        inStream.setCharacterStream(new StringReader(decodedFile.toString()));

        doc1 = db.parse(inStream);
    } catch (Exception e) {
    }
    return doc1;
}

The file content is in the StringBuffer reference object, decodedFile, but when I set it to StringReader it accept only string.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Mohan
  • 877
  • 10
  • 20
  • 32
  • 1
    StringBuffer has a toString() method to convert it to String. Check in JavaDoc before posting these questions – Aravind Yarram Feb 08 '12 at 16:35
  • yes, but DecodedFile size is larger than string size. so we need to use stringbuffer. but in StringReader accept only string objects. my problem is how can we do parse large content which is present in stringbuffer. – Mohan Feb 08 '12 at 16:41
  • Are you getting any errors while parsing it, like OutOfMemory? It might be something as simple as changing the Java memory settings. – Spencer Kormos Feb 08 '12 at 16:41
  • 1
    possible duplicate of [How to read large XML file consisting of large number of small items efficiently in Java?](http://stackoverflow.com/questions/3653448/how-to-read-large-xml-file-consisting-of-large-number-of-small-items-efficiently) – Lukas Eder Feb 08 '12 at 16:46
  • No, am not getting any exception. – Mohan Feb 08 '12 at 16:46
  • 1
    Also similar: http://stackoverflow.com/questions/7746950/parsing-very-large-xml-files-and-marshalling-to-java-objects, http://stackoverflow.com/questions/2301926/xml-process-large-data, http://stackoverflow.com/questions/3906892/parse-an-xml-string-in-java, http://stackoverflow.com/questions/355909/parsing-very-large-xml-documents-and-a-bit-more-in-java, etc, etc, etc – Lukas Eder Feb 08 '12 at 16:47

5 Answers5

5

For large documents (though I wouldn't call your's large) I'd use StAX.

helpermethod
  • 59,493
  • 71
  • 188
  • 276
2

Take a look at the JDOM XML parsing library. It's miles ahead of the native Java parsers, and in my opinion, quite superior.

For the code you provided, you actually have to walk the DOM tree and retrieve elements. See here or the official Java tutorial on working with XML for more information on working with XML documents.

FloppyDisk
  • 1,693
  • 16
  • 25
2

You might want to look at a StAX implementation like Woodstox. It lets you pull elements from the parser, instead of the parser pushing data into the app, and lets you pause parsing.

Spencer Kormos
  • 8,381
  • 3
  • 28
  • 45
2

256Kb is a pretty small file nowadays: yesterday I was working with a 45Gb file which is a factor of 200,000 larger!

It's not clear what your problem is. Any of the normal Java parsing techniques will work perfectly well. Which of them you use depends on why you are parsing the file and what you want to do with the data.

Having said that, many people seem to choose DOM by default because it is so well entrenched. However, more modern object models such as JDOM or XOM are much easier to work with.

Rob Kielty
  • 7,958
  • 8
  • 39
  • 51
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • could you please tell what did you use to parse that 45Gb File, actually i need to parse a large XML file of the order of 40 - 50 gb to a TSV or CSV, could you please tell how should i approach this.? – dpsdce Feb 23 '12 at 10:38
  • I was using the streaming facilities in Saxon-EE, documented at http://www.saxonica.com/documentation/sourcedocs/streaming.xml – Michael Kay May 24 '12 at 07:30
0

Don't read the file into a String/StringReader and all that jazz. Parse the file directly via db.parse(new FileInputStream(...)). Reading the file into memory just wastes memory, and time.

user207421
  • 305,947
  • 44
  • 307
  • 483