My program uses javax.xml.stream.XMLStreamReader
to perform a StAX parsing on a very large XML file from Wiktionary (something like 4Gb).
It works fine for a very long sequence of tags and content, then it raises a very weird exception:
java.lang.ArrayIndexOutOfBoundsException: 8192
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:546)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1629)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1667)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1755)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2965)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:558)
at XmlParser.getAllTitles(XmlParser.java:36)
at Main.main(Main.java:9)
The tag with which it is giving this exception seams very normal (</username>
), so I can't understand why.
I read on the internet that someone else had gone throught this too, and it seems that in order to fix this I must update the Xerces version.
My current Xerces version is: Xerces-J 2.7.1
I use this version of Java:
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.13.10.1)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
What I need is either make 2.7 version of Xerces work somehow or update the version used by OpenJDK.
I searched deeply for a solution but I didn't come up with anything, so I don't know what to do in both cases.