0

I have the following (test) XML file below and Java code that uses StaX. I want to apply this code to a file that is about 30 GB large but with fairly small elements, so I thought StaX is a good choice. I am getting the following error:

Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at at.tuwien.mucke.util.xml.staxtest.StaXTest.main(StaXTest.java:18) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

<?xml version='1.0' encoding='utf-8'?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <price>44.95</price>
      <description>An in-depth look at creating applications 
       with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <price>5.95</price>
      <description>A former architect battles corporate zombies, 
       an evil sorceress, and her own childhood to become queen 
       of the world.</description>
    </book>
</catalog>

Here the code:

package xml.staxtest;

import java.io.*;
import javax.xml.stream.*;

public class StaXTest {

public static void main(String[] args) throws Exception  {

    XMLInputFactory xif = XMLInputFactory.newInstance();
    XMLStreamReader streamReader = xif.createXMLStreamReader(new FileReader("D:/Data/testFile.xml"));

    while(streamReader.hasNext()){
        int eventType = streamReader.next();

        if(eventType == XMLStreamReader.START_ELEMENT){
            System.out.println(streamReader.getLocalName());
        }

        //... more to come here later ...
    }
}

}

RalfB
  • 563
  • 1
  • 7
  • 22
  • have you tried to remove the xml declaration ? (it is recommended but still optionnal) – Apolo Jun 03 '14 at 09:03
  • Yes. And I also found out that I get the same result when the file it empty. There seems to be something wrong with the file itself... Encoding ? Hidden Characters? – RalfB Jun 03 '14 at 09:13
  • Solved it! I added and had to store it in ANSI (as Notepad++ assumed UTF-8. Silly! – RalfB Jun 03 '14 at 09:20
  • I don't think that's a good idea. XML uses UTF-8 by default for a reason. I'm pretty sure your problem stems from the (unnecessary) use of an UTF-8 [BOM (Byte Order Mark)](http://en.wikipedia.org/wiki/Byte_order_mark). Remove that (and stop whatever application is inserting it from doing so) and you should be fine. – Tim Pietzcker Jun 03 '14 at 09:25
  • See also http://stackoverflow.com/questions/5138696/org-xml-sax-saxparseexception-content-is-not-allowed-in-prolog – Raedwald Jul 18 '14 at 09:33

1 Answers1

1

Solved it!

I added the encoding in the definition <?xml version="1.0" encoding="ISO-8859-1" ?> and I had to store it in ANSI (as Notepad++ assumed UTF-8). Silly!

Apolo
  • 3,844
  • 1
  • 21
  • 51
RalfB
  • 563
  • 1
  • 7
  • 22