206

I have a Java based web service client connected to Java web service (implemented on the Axis1 framework).

I am getting following exception in my log file:

Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
    at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
    at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
    at org.apache.ws.axis.security.WSDoAllReceiver.invoke(WSDoAllReceiver.java:114)
    at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
    at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
    at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
    at org.apache.axis.client.AxisClient.invoke(AxisClient.java:198)
    at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
    at org.apache.axis.client.Call.invoke(Call.java:2767)
    at org.apache.axis.client.Call.invoke(Call.java:2443)
    at org.apache.axis.client.Call.invoke(Call.java:2366)
    at org.apache.axis.client.Call.invoke(Call.java:1812)
abhi
  • 1,760
  • 1
  • 24
  • 40
ag112
  • 5,537
  • 2
  • 23
  • 42
  • 12
    It would help if you showed us the XML you are trying to parse. (Just the first few lines would do, I expect.) – Stephen C Feb 28 '11 at 06:15
  • 1
    Thanks Stephen, I am trying to retrieve XML Request from AXIS framework and paste it here. So general understanding of above error is XML is not well-formed. – ag112 Feb 28 '11 at 06:51
  • 1
    I had this issue because I was trying to transform the string name of the xml file rather than the xml file as a string! :P – Gaʀʀʏ May 16 '13 at 18:34
  • 1
    Notepad++ and change the Encoding works fine to me! – Guilherme Aug 05 '20 at 11:10

32 Answers32

274

This is often caused by a white space before the XML declaration, but it could be any text, like a dash or any character. I say often caused by white space because people assume white space is always ignorable, but that's not the case here.


Another thing that often happens is a UTF-8 BOM (byte order mark), which is allowed before the XML declaration can be treated as whitespace if the document is handed as a stream of characters to an XML parser rather than as a stream of bytes.

The same can happen if schema files (.xsd) are used to validate the xml file and one of the schema files has an UTF-8 BOM.

ParkerHalo
  • 4,341
  • 9
  • 29
  • 51
Mike Sokolov
  • 6,914
  • 2
  • 23
  • 31
  • 31
    For everyone like me, who struggles to understand what to do with John Humphreys - w00te's suggestion: change `Document document = documentBuilder.parse(new InputSource(new StringReader(xml)))` to `Document document = documentBuilder.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("UTF-8"))))` – RealMan Apr 29 '18 at 06:22
  • @RealMan, does this solution supports in groovy code? Because I got error message as: **groovy.lang.MissingPropertyException**: No such property: documentBuilder for class: script1656445880890. Which classes should I need to import if this is the classes issue? – Sagar Sikchi Jun 28 '22 at 20:07
  • 1
    A good option to check if your file has a UTF-8 BOM or not is to call `file filename.xml` (linux). Returns e.g.: `filename.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators` – mnikley Sep 19 '22 at 13:21
39

Actually in addition to Yuriy Zubarev's Post

When you pass a nonexistent xml file to parser. For example you pass

new File("C:/temp/abc")

when only C:/temp/abc.xml file exists on your file system

In either case

builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = builder.parse(new File("C:/temp/abc"));

or

DOMParser parser = new DOMParser();
parser.parse("file:C:/temp/abc");

All give the same error message.

Very disappointing bug, because the following trace

javax.servlet.ServletException
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
...
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
... 40 more

doesn't say anything about the fact of 'file name is incorrect' or 'such a file does not exist'. In my case I had absolutely correct xml file and had to spent 2 days to determine the real problem.

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
Egor
  • 551
  • 5
  • 8
  • 1
    Same with trying to parse a directory instead of a filename, FWIW. – rogerdpack Mar 26 '15 at 22:39
  • 1
    ... @Egor this is why everyone hates XML. Losing 2 days of work for such a stupid failure.. – Gewure Sep 07 '17 at 09:56
  • 1
    Absolutely agree @Gewure :) That was some ancient post from 2012 and I even forget about it, but true – Egor Sep 10 '17 at 20:26
  • 4
    This also happens, when you have a correct path, but with special symbols, like: C:\#MyFolder\My.XML The file exists, but the "#" brings problem to XML parser... Java itself, as well as M$ Windows, has no problem with this folder name.... Very bad exception message behavior .... – Alex Jan 22 '18 at 15:38
  • 1
    This was a similar problem of mine. I've spent hours trying to understand what was the problem, and I did not even think about a malformed parameter. – Balázs Börcsök Jun 29 '21 at 12:22
  • 1
    I had to build the project. Apparently my file located did not pick up the file from the src rather it was looking in the target folder. – Viktor Reinok Jul 20 '22 at 12:10
  • or if you pass a filename instead of the file content as xml :) (coworker managed this gem) – fl0w Dec 08 '22 at 11:23
28

Try adding a space between the encoding="UTF-8" string in the prolog and the terminating ?>. In XML the prolog designates this bracket-question mark delimited element at the start of the document (while the tag prolog in stackoverflow refers to the programming language).

Added: Is that dash in front of your prolog part of the document? That would be the error there, having data in front of the prolog, -<?xml version="1.0" encoding="UTF-8"?>.

hardmath
  • 8,753
  • 2
  • 37
  • 65
  • 2
    +1. I have found that some XML parsers barf this exception even when the XML prolog contains spaces - so I think it is definitely worth checking that nothing precedes the ` –  Mar 02 '11 at 01:33
15

I had the same problem (and solved it) while trying to parse an XML document with freemarker.

I had no spaces before the header of XML file.

The problem occurs when and only when the file encoding and the XML encoding attribute are different. (ex: UTF-8 file with UTF-16 attribute in header).

So I had two ways of solving the problem:

  1. changing the encoding of the file itself
  2. changing the header UTF-16 to UTF-8
JoshDM
  • 4,939
  • 7
  • 43
  • 72
  • 2
    I guess that in general any case where the parser receives conflicting information about the character encoding could cause this problem. – Raedwald Jul 18 '14 at 09:51
  • 1
    It's been a long time since this answer, but this worked for me in 2021. I am user Pester testing in a Jenkins pipeline and kept getting the "content in prolog" error. I saw that the JUnit result file is in UTF16, and i was Out-File'ing to UTF8 out of habit. When i changed to UTF-16, it worked. `Invoke-Pester -Script resources/*.Tests.ps1 -PassThru | ConvertTo-JUnitReport -AsString | Out-File -Encoding utf-16 .\results.xml` – Max Cascone Apr 29 '21 at 18:37
13

It means XML is malformed or the response body is not XML document at all.

Yuriy Zubarev
  • 2,821
  • 18
  • 24
9

Just spent 4 hours tracking down a similar problem in a WSDL. Turns out the WSDL used an XSD which imports another namespace XSD. This imported XSD contained the following:

<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://www.xyz.com/Services/CommonTypes" elementFormDefault="qualified"
    xmlns="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:CommonTypes="http://www.xyz.com/Services/CommonTypes">

 <include schemaLocation=""></include>  
    <complexType name="RequestType">
        <....

Note the empty include element! This was the root of my woes. I guess this is a variation on Egor's file not found problem above.

+1 to disappointing error reporting.

colin_froggatt
  • 147
  • 1
  • 10
6

My answer wouldn't help you probably, but it help with this problem generally.

When you see this kind of exception you should try to open your xml file in any Hex Editor and sometime you can see additional bytes at the beginning of the file which text-editor doesn't show.

Delete them and your xml will be parsed.

Igor Kustov
  • 787
  • 1
  • 8
  • 21
5

In my case, removing the 'encoding="UTF-8"' attribute altogether worked.

It looks like a character set encoding issue, maybe because your file isn't really in UTF-8.

Jerome Louvel
  • 2,882
  • 18
  • 19
4

For the same issues, I have removed the following line,

  File file = new File("c:\\file.xml");
  InputStream inputStream= new FileInputStream(file);
  Reader reader = new InputStreamReader(inputStream,"UTF-8");
  InputSource is = new InputSource(reader);
  is.setEncoding("UTF-8");

It is working fine. Not so sure why that UTF-8 gives problem. To keep me in shock, it works fine for UTF-8 also.

Am using Windows-7 32 bit and Netbeans IDE with Java *jdk1.6.0_13*. No idea how it works.

Dineshkumar
  • 1,468
  • 4
  • 22
  • 49
4

Sometimes it's the code, not the XML

The following code,

Document doc = dBuilder.parse(new InputSource(new StringReader("file.xml")));

will also result in this error,

[Fatal Error] :1:1: Content is not allowed in prolog.org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

because it's attempting to parse the string literal, "file.xml" (not the contents of the file.xml file) and failing because "file.xml" as a string is not well-formed XML.

Fix: Remove StringReader():

Document doc = dBuilder.parse(new InputSource("file.xml"));

Similarly, dirty buffer problems can leave residual junk ahead of the actual XML. If you've carefully checked your XML and are still getting this error, log the exact contents being passed to the parser; sometimes what's actually being (tried to be) parsed is surprising.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • 1
    This solution guided in right path as I forgot to add the `applicaionContext.xml` path in code, and was not checking in code was looking for error in XML file only – Mrinmoy May 07 '20 at 15:32
4

First clean project, then rebuild project. I was also facing the same issue. Everything came alright after this.

Bibin Johny
  • 3,157
  • 1
  • 13
  • 16
4

To fix the BOM issue on Unix / Linux systems:

  1. Check if there's an unwanted BOM character: hexdump -C myfile.xml | more An unwanted BOM character will appear at the start of the file as ...<?xml>

  2. Alternatively, do file myfile.xml. A file with a BOM character will appear as: myfile.xml: XML 1.0 document text, UTF-8 Unicode (with BOM) text

  3. Fix a single file with: tail -c +4 myfile.xml > temp.xml && mv temp.xml myfile.xml

  4. Repeat 1 or 2 to check the file has been sanitised. Probably also sensible to do view myfile.xml to check contents have stayed.

Here's a bash script to sanitise a whole folder of XML files:

#!/usr/bin/env bash

# This script is to sanitise XML files to remove any BOM characters

has_bom() { head -c3 "$1" | LC_ALL=C grep -qe '\xef\xbb\xbf'; }

for filename in *.xml ; do
  if has_bom ${filename}; then
    tail -c +4 ${filename} > temp.xml
    mv temp.xml ${filename}
  fi
done

Lydia Ralph
  • 1,455
  • 1
  • 17
  • 33
3

If all else fails, open the file in binary to make sure there are no funny characters [3 non printable characters at the beginning of the file that identify the file as utf-8] at the beginning of the file. We did this and found some. so we converted the file from utf-8 to ascii and it worked.

Ralph
  • 31
  • 1
3

As Mike Sokolov has already pointed it out, one of the possible reasons is presence of some character/s (such as a whitespace) before the tag.

If your input XML is being read as a String (as opposed to byte array) then you can use replace your input string with the below code to make sure that all 'un-necessary' characters before the xml tag are wiped off.

inputXML=inputXML.substring(inputXML.indexOf("<?xml"));

You need to be sure that the input xml starts with the xml tag though.

Sahil J
  • 685
  • 6
  • 10
3

What i have tried [Did not work]

In my case the web.xml in my application had extra space. Even after i deleted ; it did not work!.

I was playing with logging.properties and web.xml in my tomcat, but even after i reverted the error persists!.

Solution

To be specific i tried do adding

org.apache.catalina.filters.ExpiresFilter.level = FINE

Tomcat expire filter is not working correctly

extra space

shareef
  • 9,255
  • 13
  • 58
  • 89
2

I followed the instructions found here and i got the same error.

I tried several things to solve it (ie changing the encoding, typing the XML file rather than copy-pasting it ect) in Notepad and XML Notepad but nothing worked.

The problem got solved when I edited and saved my XML file in Notepad++ (encoding --> utf-8 without BOM)

2

In my case I got this error because the API I used could return the data either in XML or in JSON format. When I tested it using a browser, it defaulted to the XML format, but when I invoked the same call from a Java application, the API returned the JSON formatted response, that naturally triggered a parsing error.

zovits
  • 906
  • 16
  • 27
1

I was also getting the same

XML reader error: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,2] Message: Reference is not allowed in prolog.

, when my application was creating a XML response for a RestFull Webservice call. While creating the XML format String I replaced the &lt and &gt with < and > then the error went off, and I was getting proper response. Not sure how it worked but it worked.

sample:

String body = "<ns:addNumbersResponse xmlns:ns=\"http://java.duke.org\"><ns:return>"
            +sum
            +"</ns:return></ns:addNumbersResponse>";
shareef
  • 9,255
  • 13
  • 58
  • 89
Satish M
  • 21
  • 1
1

I had the same issue.

First I downloaded the XML file to local desktop and I got Content is not allowed in prolog during the importing file to portal server. Even visually file was looking good to me but somehow it's was corrupted.

So I re-download the same file and tried the same and it worked.

Marko
  • 20,385
  • 13
  • 48
  • 64
paresh
  • 11
  • 1
1

We had the same problem recently and it turned out to be the case of a bad URL and consequently a standard 403 HTTP response (which obviously isn't the valid XML the client was looking for). I'm going to share the detail in case someone within the same context run into this problem:

This was a Spring based web application in which a "JaxWsPortProxyFactoryBean" bean was configured to expose a proxy for a remote port.

<bean id="ourPortJaxProxyService"
    class="org.springframework.remoting.jaxws.JaxWsPortProxyFactoryBean"
    p:serviceInterface="com.amir.OurServiceSoapPortWs"
    p:wsdlDocumentUrl="${END_POINT_BASE_URL}/OurService?wsdl"
    p:namespaceUri="http://amir.com/jaxws" p:serviceName="OurService"
    p:portName="OurSoapPort" />

The "END_POINT_BASE_URL" is an environment variable configured in "setenv.sh" of the Tomcat instance that hosts the web application. The content of the file is something like this:

export END_POINT_BASE_URL="http://localhost:9001/BusinessAppServices"
#export END_POINT_BASE_URL="http://localhost:8765/BusinessAppServices"

The missing ";" after each line caused the malformed URL and thus the bad response. That is, instead of "BusinessAppServices/OurService?wsdl" the URL had a CR before "/". "TCP/IP Monitor" was quite handy while troubleshooting the problem.

Amir Keibi
  • 1,991
  • 28
  • 45
1

For all those that get this error: WARNING: Catalina.start using conf/server.xml: Content is not allowed in prolog.

Not very informative.. but what this actually means is that there is garbage in your conf/server.xml file.

I have seen this exact error in other XML files.. this error can be caused by making changes with a text editor which introduces the garbage.

The way you can verify whether or not you have garbage in the file is to open it with a "HEX Editor" If you see any character before this string

     "<?xml version="1.0" encoding="UTF-8"?>"

like this would be garbage

     "‰ŠŒ<?xml version="1.0" encoding="UTF-8"?>"

that is your problem.... The Solution is to use a good HEX Editor.. One that will allow you to save files with differing types of encoding..

Then just save it as UTF-8. Some systems that use XML files may need it saved as UTF NO BOM Which means with "NO Byte Order Mark"

Hope this helps someone out there!!

CA Martin
  • 307
  • 2
  • 7
1

For me, a Build->Clean fixed everything!

FabioStein
  • 750
  • 7
  • 23
1

I had the same problem with some XML files, I solved reading the file with ANSI encoding (Windows-1252) and writing a file with UTF-8 encoding with a small script in Python. I tried use Notepad++ but I didn't have success:

import os
import sys

path = os.path.dirname(__file__)

file_name = 'my_input_file.xml'

if __name__ == "__main__":
    with open(os.path.join(path, './' + file_name), 'r', encoding='cp1252') as f1:
        lines = f1.read()
        f2 = open(os.path.join(path, './' + 'my_output_file.xml'), 'w', encoding='utf-8')
        f2.write(lines)
        f2.close()
Ângelo Polotto
  • 8,463
  • 2
  • 36
  • 37
1

Just an additional thought on this one for the future. Getting this bug could be the case that one simply hits the delete key or some other key randomly when they have an XML window as the active display and are not paying attention. This has happened to me before with the struts.xml file in my web application. Clumsy elbows ...

demongolem
  • 9,474
  • 36
  • 90
  • 105
0

Even I had faced a similar problem. Reason was some garbage character at the beginning of the file.

Fix : Just open the file in a text editor(tested on Sublime text) remove any indent if any in the file and copy paste all the content of the file in a new file and save it. Thats it!. When I ran the new file it ran without any parsing errors.

Aditya Gaykar
  • 470
  • 1
  • 5
  • 10
0

I took code of Dineshkumar and modified to Validate my XML file correctly:

import org.apache.log4j.Logger;

public class Myclass{

private static final Logger LOGGER = Logger.getLogger(Myclass.class);

/**
 * Validate XML file against Schemas XSD in pathEsquema directory
 * @param pathEsquema directory that contains XSD Schemas to validate
 * @param pathFileXML XML file to validate
 * @throws BusinessException if it throws any Exception
 */
public static void validarXML(String pathEsquema, String pathFileXML) 
 throws BusinessException{ 
 String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
 String nameFileXSD = "file.xsd";
 String MY_SCHEMA1 = pathEsquema+nameFileXSD);
 ParserErrorHandler parserErrorHandler;
 try{
  SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA);
  
  Source [] source = { 
   new StreamSource(new File(MY_SCHEMA1))
   };
  Schema schemaGrammar = schemaFactory.newSchema(source);

  Validator schemaValidator = schemaGrammar.newValidator();
  schemaValidator.setErrorHandler(
   parserErrorHandler= new ParserErrorHandler());
  
  /** validate xml instance against the grammar. */
  File file = new File(pathFileXML);
  InputStream isS= new FileInputStream(file);
  Reader reader = new InputStreamReader(isS,"UTF-8");
  schemaValidator.validate(new StreamSource(reader));
  
  if(parserErrorHandler.getErrorHandler().isEmpty()&& 
   parserErrorHandler.getFatalErrorHandler().isEmpty()){
   if(!parserErrorHandler.getWarningHandler().isEmpty()){
    LOGGER.info(
    String.format("WARNING validate XML:[%s] Descripcion:[%s]",
     pathFileXML,parserErrorHandler.getWarningHandler()));
   }else{
    LOGGER.info(
    String.format("OK validate  XML:[%s]",
     pathFileXML));
   }
  }else{
   throw new BusinessException(
    String.format("Error validate  XML:[%s], FatalError:[%s], Error:[%s]",
    pathFileXML,
    parserErrorHandler.getFatalErrorHandler(),
    parserErrorHandler.getErrorHandler()));
  }  
 }
 catch(SAXParseException e){
  throw new BusinessException(String.format("Error validate XML:[%s], SAXParseException:[%s]",
   pathFileXML,e.getMessage()),e);
 }
 catch (SAXException e){
  throw new BusinessException(String.format("Error validate XML:[%s], SAXException:[%s]",
   pathFileXML,e.getMessage()),e);
 }
 catch (IOException e) {
  throw new BusinessException(String.format("Error validate XML:[%s], 
   IOException:[%s]",pathFileXML,e.getMessage()),e);
 }
 
}

}
RodH
  • 13
  • 4
0

I had the same issue with spring

MarshallingMessageConverter

and by pre-proccess code.

Mayby someone will need reason: BytesMessage #readBytes - reading bytes.. and i forgot that reading is one direction operation. You can not read twice.

Learning Always
  • 1,563
  • 4
  • 29
  • 49
Artem Ptushkin
  • 1,151
  • 9
  • 17
0

Try with BOMInputStream in apache.commons.io:

public static <T> T getContent(Class<T> instance, SchemaType schemaType, InputStream stream) throws JAXBException, SAXException, IOException {

    JAXBContext context = JAXBContext.newInstance(instance);
    Unmarshaller unmarshaller = context.createUnmarshaller();
    Reader reader = new InputStreamReader(new BOMInputStream(stream), "UTF-8");

    JAXBElement<T> entry = unmarshaller.unmarshal(new StreamSource(reader), instance);

    return entry.getValue();
}
0

I was having the same problem while parsing the info.plist file in my mac. However, the problem was fixed using the following command which turned the file into an XML.

plutil -convert xml1 info.plist

Hope that helps someone.

Reaz Murshed
  • 23,691
  • 13
  • 78
  • 98
0

I encountered similar problem with jenkins junit report plugin. It turns out you have to specify *.xml, even if you create junit xml in home directory. (So Test report XMLs: .xml ..(or targeted_directory/.xml).

NotTooTechy
  • 448
  • 5
  • 9
0

The reason was the spaces between the tags.

' <?xml version="1.0" encoding="UTF-8" standalone="no"?> <sign: ....'

Delete spaces.

Kairat Koibagarov
  • 1,385
  • 15
  • 9
0

Set your document to form like this:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
    %children%
</root>
Laurel
  • 5,965
  • 14
  • 31
  • 57
Pavel
  • 4,912
  • 7
  • 49
  • 69