"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

Question

I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.

I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:

<?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
    <ListDomainsResult>
        <DomainName>Audio</DomainName>
        <DomainName>Course</DomainName>
        <DomainName>DocumentContents</DomainName>
        <DomainName>LectureSet</DomainName>
        <DomainName>MetaData</DomainName>
        <DomainName>Professors</DomainName>
        <DomainName>Tag</DomainName>
    </ListDomainsResult>
    <ResponseMetadata>
        <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
        <BoxUsage>0.0000071759</BoxUsage>
    </ResponseMetadata>
</ListDomainsResponse>

I pass in this XML to a parser with

XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());

and call eventReader.nextEvent(); a bunch of times to get the data I want.

Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:

com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
    at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
    ... (rest of lines omitted)

I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.

It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:

XML with and without the prolog
With and without newlines
With and without the "encoding=" attribute in the prolog
Both newline styles
With and without the chunking information present in the HTTP stream

And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?

Thanks!

We are probably going to need to see some more code. Another possibility is that locally it is not getting chunked while on GAE it is. How are you handling the code before you pass it to the parser ? — Romain Hippeau, Jun 13 '10 at 03:41
I considered the chunking possibility too, but it doesn't seem to be the case since the error message that the parser is throwing contains the entire XML right there (it's pasted above). The entire modified SDK code can be found at http://github.com/AdrianP/aws-sdk-for-java (look at the most recent commits) but there's a LOT of code there. I will try to create a smaller reproducible sample soon, although even that will be hard. It's a big complicated piece of software... Thanks for your feedback though! :) — Adrian Petrescu, Jun 13 '10 at 03:47
possible duplicate of [org.xml.sax.SAXParseException: Content is not allowed in prolog](http://stackoverflow.com/questions/5138696/org-xml-sax-saxparseexception-content-is-not-allowed-in-prolog) — Raedwald, Jul 18 '14 at 10:29
@Raedwald, I don't think it is my question that is the duplicate, since my question was posted a year earlier than that one :) — Adrian Petrescu, Jul 18 '14 at 18:44
The other question is more useful as a canonical question, as it is more general. — Raedwald, Jul 18 '14 at 23:13
@AdrianPetrescu see this MSE answer: http://meta.stackexchange.com/a/147651/170084 — Raedwald, Jul 18 '14 at 23:19
This should be an example of how a question should be asked on SO, reading through it gave me various insights on how to debug as a developer (thanks OP) — Sudip Bhandari, Jan 02 '18 at 13:43

Romain Hippeau · Accepted Answer · 2010-06-13T03:49:51.450

175

The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>

Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

helloworld<?xml version="1.0" encoding="utf-8"?>

or even a space or special character.

There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...

String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");

edited Jun 13 '10 at 03:49

answered Jun 13 '10 at 03:02

Romain Hippeau

24,113
5
60
79

Hi Romain, thanks for the response! I've double and triple checked many times for anything in the buffer prior to the prolog (including hidden characters) but there simply isn't anything else there. I'll give switching to utf-16 encoding a try, however -- out of curiousity, where did you get the information that the XSD uses UTF-16? – Adrian Petrescu Jun 13 '10 at 03:21
@Adrian Petrescu Sorry, these are just examples If you are using DTDs or XSDs make sure they match with your XML. Before you parse the XML capture it in a String and surround it with '|' and print it to the console. This will tell you if you are passing in some extra characters. – Romain Hippeau Jun 13 '10 at 03:27
Ah, I see :) Unfortunately I tried it and it doesn't appear to be the case in this situation. Thanks anyway! – Adrian Petrescu Jun 13 '10 at 03:32
@Adrian Petrescu I updated my post for you to try something else. Change your XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent()); to ... String xml = response.getContent(); xml = xml.trim().replaceFirst("^([\\W]+)<","<"); XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(xml); – Romain Hippeau Jun 13 '10 at 03:52
Thanks, I'll give this a try soon, even though I previously already checked for byte-order marks; maybe they're being introduced somewhere between the input stream and the XMLReader. – Adrian Petrescu Jun 13 '10 at 04:39
helloworld [something before – Piyush Patel May 10 '12 at 08:21
2

Thanks! This saved me as well. xml.trim().replaceFirst("^([\\W]+)<","<"); – stackoverflow Jan 24 '13 at 20:00
2

Someone please make this the accepted answer. Solved my problem straight away. I was parsing a Message that started with "Message: – Ric Jafe Feb 20 '13 at 14:48
This solves my problem that am facing with a feed xml from one site. But it breaks for other URL where the parser did not have any problem earlier. I am not quite able to figure out exactly what the regex: "^([\\W]+)<" does. I am getting the XML from an input stream. Please explain how this regex works exactly. – codeMan Sep 10 '13 at 08:42
@codeMan the regex replaces all starting whitespaces and starting < with – Romain Hippeau Sep 10 '13 at 09:43
@Raedwald compare the dates. This was answered 3 years earlier. – Romain Hippeau Jul 18 '14 at 11:17
@RomainHippeau see this MSE answer: http://meta.stackexchange.com/a/147651/170084 – Raedwald Jul 18 '14 at 23:20
I also had a case where a character at the end of the prolog was causing the problem. I was getting XML messages where they were putting period after every >, like so ">." This results in a first line that looks like this: "." – BigMac66 Feb 09 '16 at 12:54
@ Romain We can change the encoding using Notepad++. Will it work then? – Vishnu T S Jul 04 '17 at 13:30
@peter maybe but I describe two possible issues in my answer. – Romain Hippeau Jul 04 '17 at 14:53
In my case there was a hidden character in the before – darkman97i Nov 19 '18 at 16:08

score 16 · Answer 2 · answered Jul 27 '18 at 06:24

16

I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <?xml version="1.0" encoding="utf-8"?>

Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)

answered Jul 27 '18 at 06:24

techloris_109

547
5
13

I had a similar issue. In my case the XML header did not have an encoding attribute. Notepad++ defaulted to UTF-8 encoding. Once I switched Notepad++ encoding to ANSI, the issue stopped. – lafual Feb 20 '22 at 07:29
1

Removing BOMs from all XML files in working directory by Vim: `vim -c ":bufdo set nobomb|update" -c "q" *.xml` . – Fofola Jun 23 '22 at 12:22

Sunmit Girme · Answer 3 · 2014-07-17T07:12:50.877

11

This error message is always caused by the invalid XML content in the beginning element. For example, extra small dot “.” in the beginning of XML element.

Any characters before the “<?xml….” will cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” error message.

A small dot “.” before the “<?xml….

To fix it, just delete all those weird characters before the “<?xml“.

Ref: http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/

edited Jul 17 '14 at 07:12

answered May 07 '13 at 12:19

Sunmit Girme

559
4
13
30

3

You should mention where you referred that http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/ – arulraj.net Jul 16 '14 at 09:37

score 7 · Answer 4 · answered Oct 13 '19 at 08:32

7

I catched the same error message today. The solution was to change the document from UTF-8 with BOM to UTF-8 without BOM

answered Oct 13 '19 at 08:32

matjung

173
1
7

I had the same issue. Changing file format resolved the issue. Thanks! – code_fish Jun 25 '20 at 16:40
Damn, you are a champ! Would have never guessed this! – Kadaj Feb 02 '23 at 12:51

score 6 · Answer 5 · answered May 23 '14 at 13:59

I was facing the same issue. In my case XML files were generated from c# program and feeded into AS400 for further processing. After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac(in AS400) uses "UTF8 without BOM". So, had to write extra code similar to mentioned below:

//create encoding with no BOM
Encoding outputEnc = new UTF8Encoding(false); 
//open file with encoding
TextWriter file = new StreamWriter(filePath, false, outputEnc);           

file.Write(doc.InnerXml);
file.Flush();
file.Close(); // save and close it

score 3 · Answer 6 · answered Feb 09 '15 at 18:03

In my xml file, the header looked like this:

<?xml version="1.0" encoding="utf-16"? />

In a test file, I was reading the file bytes and decoding the data as UTF-8 (not realizing the header in this file was utf-16) to create a string.

byte[] data = Files.readAllBytes(Paths.get(path));
String dataString = new String(data, "UTF-8");

When I tried to deserialize this string into an object, I was seeing the same error:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.

When I updated the second line to

String dataString = new String(data, "UTF-16");

I was able to deserialize the object just fine. So as Romain had noted above, the encodings need to match.

score 2 · Answer 7 · answered Jul 18 '18 at 15:21

2

Removing the xml declaration solved it

<?xml version='1.0' encoding='utf-8'?>

answered Jul 18 '18 at 15:21

F.O.O

4,730
4
24
34

score 2 · Answer 8 · edited Jun 20 '20 at 09:12

2

Unexpected reason: `#` character in file path

Due to some internal bug, the error Content is not allowed in prolog also appears if the file content itself is 100% correct but you are supplying the file name like C:\Data\#22\file.xml.

This may possibly apply to other special characters, too.

How to check: If you move your file into a path without special characters and the error disappears, then it was this issue.

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 19 '19 at 07:32

miroxlav

11,796
5
58
99

1

Took me two days to realize that this was the issue. Problem was caused by the Windows-User-Name under which the Tomcat-Service is running. The user name contains a ````#```` character, so the User-Profiles path also contains this character.... – schlomm Oct 20 '22 at 15:50

score 1 · Answer 9 · edited Mar 25 '15 at 07:22

I was facing the same problem called "Content is not allowed in prolog" in my xml file.

Solution

Initially my root folder was '#Filename'.

When i removed the first character '#' ,the error got resolved.

No need of removing the #filename... Try in this way..

Instead of passing a File or URL object to the unmarshaller method, use a FileInputStream.

File myFile = new File("........");
Object obj = unmarshaller.unmarshal(new FileInputStream(myFile));

score 1 · Answer 10 · answered Jun 13 '18 at 15:07

In the spirit of "just delete all those weird characters before the <?xml", here's my Java code, which works well with input via a BufferedReader:

    BufferedReader test = new BufferedReader(new InputStreamReader(fisTest));
    test.mark(4);
    while (true) {
        int earlyChar = test.read();
        System.out.println(earlyChar);
        if (earlyChar == 60) {
            test.reset();
            break;
        } else {
            test.mark(4);
        }
    }

FWIW, the bytes I was seeing are (in decimal): 239, 187, 191.

score 0 · Answer 11 · answered Aug 21 '13 at 13:16

0

I had a tab character instead of spaces. Replacing the tab '\t' fixed the problem.

Cut and paste the whole doc into an editor like Notepad++ and display all characters.

answered Aug 21 '13 at 13:16

SoloPilot

1,484
20
17

score 0 · Answer 12 · answered Feb 21 '15 at 14:31

0

In my instance of the problem, the solution was to replace german umlauts (äöü) with their HTML-equivalents...

answered Feb 21 '15 at 14:31

MBaas

7,248
6
44
61

score 0 · Answer 13 · answered Dec 12 '16 at 09:36

bellow are cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” exception.

First check the file path of schema.xsd and file.xml.
The encoding in your XML and XSD (or DTD) should be same.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-8'?>
if anything comes before the XML document type declaration.i.e: hello<?xml version='1.0' encoding='utf-16'?>

score 0 · Answer 14 · answered Jan 05 '21 at 23:26

0

I zipped the xml in a Mac OS and sent it to a Windows machine, the default compression changes these files so the encoding sent this message.

answered Jan 05 '21 at 23:26

htafoya

18,261
11
80
104

score 0 · Answer 15 · answered Aug 02 '22 at 09:24

Happened to me with @JsmListener with Spring Boot when listening to IBM MQ. My method received String parameter and got this exception when I tried to deserialize it using JAXB.

It seemed that that the string I got was a result of byte[].toString(). It was a list of comma separated numbers.

I solved it by changing the parameter type to byte[] and then created a String from it:

@JmsListener(destination = "Q1")
public void receiveQ1Message(byte[] msgBytes) {
    var msg = new String(msgBytes);

score 0 · Answer 16 · answered Feb 27 '23 at 11:29

I had encountered this message when running a test case in SoapUI:

org.xml.sax.SAXParseException; systemId: file://; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

After quite some time I figured out the reason being the following line:

def holder = groovyUtils.getXmlHolder("SoapCall#Request") // Get Request body

And the reason was that the test step was actually named "SOAPCall" and not "SoapCall". I suppose the returned string was empty, which caused the "prolog" error.

"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

16 Answers16

Unexpected reason: `#` character in file path

Linked

"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

16 Answers16

Unexpected reason: # character in file path

Linked

Unexpected reason: `#` character in file path