33

I am trying to parse an XML response, but I am failing miserably. I thought initially that the xml was just not being returned in the response, so I crafted up the code below with a direct link to my xml file online. I am able to print the XML to screen with no problems. However when I call my parse method I get Premature end of file.

It works if I pass the URL directly:

  • builder.parse("");

but fails when I passed an InputStream:

  • builder.parse(connection.getInputStream());

      try {
        URL url = new URL(xml);
        URLConnection uc =  url.openConnection();
        HttpURLConnection  connection = (HttpURLConnection )uc;
    
        connection.setDoInput(true);
        connection.setDoOutput(true);
    
        InputStream instream;
        InputSource source;
        //get XML from InputStream
        if(connection.getResponseCode()>= 200){
            connection.connect();       
            instream = connection.getInputStream();         
            parseDoc(instream);     
        }
        else{
            instream = connection.getErrorStream();
        }
    
    
    } catch (MalformedURLException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (ParserConfigurationException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (SAXException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    
    
    
     static void parseDoc(InputStream instream) throws ParserConfigurationException,
     SAXException, IOException{
    
    
      BufferedReader buff_read = new BufferedReader(new InputStreamReader(instream,"UTF-8"));
        String  inputLine = null;
    
        while((inputLine = buff_read.readLine())!= null){
            System.out.println(inputLine);
        }
    
      DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance();
      factory.isIgnoringElementContentWhitespace();
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document doc = builder.parse(instream);
    }
    

The errors I am getting:

    [Fatal Error] :1:1: Premature end of file.
org.xml.sax.SAXParseException: Premature end of file.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at com.ameba.api.network.MainApp.parseDoc(MainApp.java:78)
    at com.ameba.api.network.MainApp.main(MainApp.java:41)
grandsirr
  • 584
  • 4
  • 19
Fabii
  • 3,820
  • 14
  • 51
  • 92
  • Do you have the XML file you are trying to parse. Premature end of file indicates your XML file was not complete, since your are using URL connection here, I suspect network issues. Best way to solve this issue is to capture this XML file using wireshark or TCP monitor kind of tools and then check if it is complete – NiranjanBhat Apr 05 '12 at 04:48
  • @NiranjanBhat. Yes the XMl is complete and valid. I have parse this xml with a direct link. Its seems the error only arises when use an InputStream. – Fabii Apr 05 '12 at 04:56
  • Why are you doing a POST but not sending any data? – user207421 Oct 11 '17 at 00:36

8 Answers8

40

When you do this,

while((inputLine = buff_read.readLine())!= null){
        System.out.println(inputLine);
    }

You consume everything in instream, so instream is empty. Now when try to do this,

Document doc = builder.parse(instream);

The parsing will fail, because you have passed it an empty stream.

sbridges
  • 24,960
  • 4
  • 64
  • 71
  • I removed the readLine() statement. But I am still getting the same error. If I supply the direct link to the xml it works. If I try to process using connection.getInputStream() if throws that error. – Fabii Apr 05 '12 at 15:07
  • There was also an issue with stream that was being return. Problem solved. – Fabii Apr 05 '12 at 18:35
  • 2
    @Fabii What was the issue with the stream being returned? I want to know because I'm getting the same issue. – NobleUplift Apr 28 '15 at 14:06
  • 4
    You are right, you cannot read input stream twice. Nice explanation is also here: http://www.danielschneller.com/2008/01/saxparseexception-1-1-premature-end-of.html – lu_ko Oct 09 '15 at 11:08
  • 1
    @sbridges , very nice explained! – Baked Inhalf Jul 13 '16 at 14:16
6

You are getting the error because the SAXBuilder is not intelligent enough to deal with "blank states". So it looks for at least an <xml ..> declaration, and when that causes a no data response it creates the exception you see rather than report the empty state.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
mist42nz
  • 97
  • 1
  • 8
  • 1
    This answer is so useful. Not always you can debug or change the code if you get the error from a third party tool. I was in this exact same situation where an XML that wasn't supposed to be empty, was empty in fact. Thanks and have my +1. – sampathsris Feb 15 '19 at 04:36
4

For those who reached this post for Answer:

This happens mainly because the InputStream the DOM parser is consuming is empty

So in what I ran across, there might be two situations:

  1. The InputStream you passed into the parser has been used and thus emptied.
  2. The File or whatever you created the InputStream from may be an empty file or string or whatever. The emptiness might be the reason caused the problem. So you need to check your source of the InputStream.
cinqS
  • 1,175
  • 4
  • 12
  • 32
1

I came across the same error, and could easily find what was the problem by logging the exception:

documentBuilder.setErrorHandler(new ErrorHandler() {
    @Override
    public void warning(SAXParseException exception) throws SAXException {
        log.warn(exception.getMessage());
    }

    @Override
    public void fatalError(SAXParseException exception) throws SAXException {
        log.error("Fatal error ", exception);
    }

    @Override
    public void error(SAXParseException exception) throws SAXException {
        log.error("Exception ", exception);
    }
});

Or, instead of logging the error, you can throw it and catch it where you handle the entries, so you can print the entry itself to get a better indication on the error.

Maroun
  • 94,125
  • 30
  • 188
  • 241
1

I resolved the issue by converting the source feed from http://www.news18.com/rss/politics.xml to https://www.news18.com/rss/politics.xml

with http below code was creating an empty file which was causing the issue down the line

    String feedUrl = "https://www.news18.com/rss/politics.xml"; 
    File feedXmlFile = null;

    try {
    feedXmlFile =new File("C://opinionpoll/newsFeed.xml");
    FileUtils.copyURLToFile(new URL(feedUrl),feedXmlFile);


          DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
          DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
          Document doc = dBuilder.parse(feedXmlFile);
1

Use inputstream once don't use it multiple times and Do inputstream.close()

0

<?xml version="1.0" encoding="UTF-8"?>

Make sure to insert the heading properly at the top level and it should not point to any descendant within your XML file.

-3

One of the other reason is , you should whitelist your IP address (IPv4) in your mongodb settings. Hope it resolves !