I am trying to parse 11384 XML files into one SQLite database. One of them:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (C) 2009/2010/2011 Ulrich Apel.
This work is distributed under the conditions of the Creative Commons
Attribution-Share Alike 3.0 Licence. This means you are free:
* to Share - to copy, distribute and transmit the work
* to Remix - to adapt the work
Under the following conditions:
* Attribution. You must attribute the work by stating your use of KanjiVG in
your own copyright header and linking to KanjiVG's website
(http://kanjivg.tagaini.net)
* Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same or similar license to this
one.
See http://creativecommons.org/licenses/by-sa/3.0/ for more details.
-->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" [
<!ATTLIST g
xmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net"
kvg:element CDATA #IMPLIED
kvg:variant CDATA #IMPLIED
kvg:partial CDATA #IMPLIED
kvg:original CDATA #IMPLIED
kvg:part CDATA #IMPLIED
kvg:number CDATA #IMPLIED
kvg:tradForm CDATA #IMPLIED
kvg:radicalForm CDATA #IMPLIED
kvg:position CDATA #IMPLIED
kvg:radical CDATA #IMPLIED
kvg:phon CDATA #IMPLIED >
<!ATTLIST path
xmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net"
kvg:type CDATA #IMPLIED >
]>
<svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109">
<g id="kvg:StrokePaths_0ff01" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;">
<g id="kvg:0ff01">
<path id="kvg:0ff01-s1" d="M54.5,15.79c0,6.07-0.29,55.49-0.29,60.55"/>
<path id="kvg:0ff01-s2" d="M54.5,88 c -0.83,0 -1.5,0.67 -1.5,1.5 0,0.83 0.67,1.5 1.5,1.5 0.83,0 1.5,-0.67 1.5,-1.5 0,-0.83 -0.67,-1.5 -1.5,-1.5"/>
</g>
</g>
<g id="kvg:StrokeNumbers_0ff01" style="font-size:8;fill:#808080">
<text transform="matrix(1 0 0 1 45 16)">1</text>
<text transform="matrix(1 0 0 1 45 88)">2</text>
</g>
</svg>
I'm using SAX parser:
public class SaxKanjivgHandler extends DefaultHandler {
.....
File folder = new File(KANJIVG_DIRECTORY);
if (folder.isDirectory()) {
File[] listOfFiles = folder.listFiles();
for (File file : listOfFiles) {
if (file.isFile()) {
currentFileName = file.getName();
readXmlFromFile(file);
}
}
}
.....
public void readXmlFromFile(File file) throws ParserConfigurationException,
SAXException, IOException {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
parser.parse(file, this);
}
When I am parsing files, I am getting this error:
Exception in thread "main" java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.MeteredStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipSpaces(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanEntityDecl(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDecls(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDTDExternalSubset(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at SaxKanjivgHandler.readXmlFromFile(SaxKanjivgHandler.java:63) at SaxKanjivgHandler.(SaxKanjivgHandler.java:44) at Main.main(Main.java:28)
Firstly, I thought that this error was because of one exact file. But an error is happening with different files in different times. How to make SAX parser stop connecting to the Internet?