0

I have a method that parses RSS from differents url's and works great:

For example: https://www.clarin.com/rss/lo-ultimo/

But in one of these url (https://www.cio.com/category/mobile/index.rss) and in all of the RSS of that web, when I execute the code, the console shows me the following error and the parser doesn't works:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.

I'am parsing the RSS feed's with this method (a part of the code):

        try {
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

            URL url = new URL("https://www.cio.com/category/mobile/index.rss");
            URLConnection urlConnection = url.openConnection();
            InputStream inputStream = urlConnection.getInputStream();

            Document doc = dBuilder.parse(inputStream);

The error happens in the last line -> Document doc = dBuilder.parse(inputStream);

In that code I'am parsing the RSS from the url, the strange thing is that when I parse the RSS directly from the file (index.rss) I have no errors and the parsing works great, I do this using:

File fXmlFile = new File("index.rss");

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();

DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

Document doc = dBuilder.parse(fXmlFile);

Document doc = dBuilder.parse(inputStream);

doc.getDocumentElement().normalize();

To notice:

  • This is a maven webapp project.
  • Deployed in Tomcat 9.0 server.
  • The method run when I press a button in the web's main page.

I mention that because when I tried in a simple java project, the parser works fine with the inputStream too.

I would appreciate very much if you could help me with this, thanks!

1 Answers1

0

I've run the following code and it works fine without errors.

     public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

        URL url = new URL("https://www.cio.com/category/mobile/index.rss");
        URLConnection urlConnection = url.openConnection();
        InputStream inputStream = urlConnection.getInputStream();

        Document doc = dBuilder.parse(inputStream);
        Element root = doc.getDocumentElement();
        NodeList children = root.getChildNodes();

        for (int i = 0; i < children.getLength(); i++) {
             System.out.println(children.item(i));
        }

        inputStream.close();

     }

Then I added the following and attempted to parse an empty file:

    File fXmlFile = new File("EmptyFile.xml");
    inputStream = new FileInputStream(fXmlFile);
    doc = dBuilder.parse(inputStream);
    System.out.println(doc.getDocumentElement());

When the file was empty (or just contained the XML processing instruction), I received the error you are receiving. When I added a root element, the error disappeared. This seems to me to prove that this error occurs when inputStream (or the thing it is streaming anyway) is essentially empty. This theory also seems to be supported by: org.xml.sax.SAXParseException: Premature end of file for *VALID* XML. I would therefore suggest, if you're still receiving this error, to put a breakpoint on URL url... and follow it through to see if the connection is being made properly. Hope that helps.

  • I read your answer and then I created a simple java project to proved my code there in a static main method, and the code works fine. But in my original question I forgot to mention that this is a "maven webapp project" deployed in Tomcat Server 9.0, and the method is run when I click a button in the web. I don't know is this is related to the problem, but in the web project doesn't work. – agrognetti Aug 26 '17 at 18:58
  • Ok. Well that is obviously where the issue lies then... It's mid-evening here now and time for beer (), and it's a national holiday on Monday. But if I get chance tomorrow, I'll take a look. If not, I'll take a look as soon as I can :-) In the meantime, good luck :-) – Mike Radley Aug 26 '17 at 19:18
  • Hahah ok Mike, in the meantime I will try to get the solution, and if I get it I will post it, thanks a lot! – agrognetti Aug 26 '17 at 19:44
  • The first thing I'd do is prove that the problem definitely lies with the input stream and not elsewhere. There's a class called PushbackInputStream (see http://tutorials.jenkov.com/java-io/pushbackinputstream.html). Maybe use this to check there is something to read in the stream, and then only access it if there is. This should allow you to manage/catch the error. If our theory is correct and inputstream is empty, then the next step is to think of a workaround for when that happens. This'll get your code working again, and buys you time to figure out why the stream sometimes fails :-) – Mike Radley Aug 27 '17 at 10:13