37

I'm using Java and i'm trying to get XML document from some http link. Code I'm using is:

URL url = new URL(link);

HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
Document doc = null;

CountInputStream in = new CountInputStream(url.openStream());
doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);

Don't pay attention at CountInputStream, it's some special class acting like regular input stream.

Using the code above, I sometimes got error Fatal Error :1:1: Content is not allowed in prolog. I assume that is has something to do with bad format of xml, but I have no idea how to fix it.

posdef
  • 6,498
  • 11
  • 46
  • 94
guest86
  • 2,894
  • 8
  • 49
  • 72
  • 2
    possible duplicate of [Java parsing XML document gives "Content not allowed in prolog." error](http://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-not-allowed-in-prolog-error) – Noel M Jul 20 '12 at 10:22
  • Well as i understood thread you're referring to is about reading xml from disk. In my case i don't have xml on the disk, i just have some string (link) and i got error before i got xml file... – guest86 Jul 20 '12 at 10:24
  • Can you give the URL in question? The most likely cause of this is a malformed response, so a look at that would be valuable. – MvG Jul 20 '12 at 10:42
  • It does not matter where XML file originates from, errors are still the same. "Content not allowed in prolog" refers to the fact that something else but not opening tag was found at the beginning of the file/stream. If it contains extra spaces just trim them, but generally this sort of error is not (programmatically) recoverable. – Germann Arlington Jul 20 '12 at 10:44
  • After reading you comments i have manually checked the response of http page and it really had badly formatted xml... sorry for bothering you, i never had problems like that before... :\ – guest86 Jul 20 '12 at 10:56
  • The fault is in the resource you are reading, which you have not provided. – Raedwald Jul 18 '14 at 09:54
  • Possible duplicate of http://stackoverflow.com/questions/5138696/org-xml-sax-saxparseexception-content-is-not-allowed-in-prolog – Raedwald Jul 18 '14 at 09:55

7 Answers7

36

I'm turning my comment to an answer, so it can be accepted and this question no longer remains unanswered.

The most likely cause of this is a malformed response, which includes characters before the initial <?xml …>. So please have a look at the document as transferred over HTTP, and fix this on the server side.

MvG
  • 57,380
  • 22
  • 148
  • 276
  • 2
    These weird characters at starting of file are BOM (byte order mark), ideally BOM should not be present with utf-8 encoding as java fails to parse such exception and gives above error – techExplorer Apr 25 '14 at 12:07
9

There are certainly some weird characters (e.g. BOM) or some whitespace before the XML preamble (<?xml ...?>)?

Johannes Weiss
  • 52,533
  • 16
  • 102
  • 136
  • 2
    In my case I had wrongly added comments in xml using java style comments. e.g. Instead of using I had used /* */ removing that fixed the same for me – Chaitanya K Jan 13 '16 at 10:29
3

I wanted YAML for the log4j2 configuration file because it doffs XML's visual clutter, but had the same error as Guest96. I scoured the web for a solution to the above, investigating a Utf-8 BOM or other content in the YAML header area; no joy. Of course, the answer is usually simple.

Somewhere, I had fully missed it that using YAML with log4j2 required the jackson libraries, per https://www.sentinelone.com/blog/log4j2-configuration-detailed-guide/. Adding the jackson reference to my (Gradle) configuration fixed the problem:

// Gain support for log4j2.
// https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j
implementation 'org.apache.logging.log4j:log4j-api:2.14.1'
implementation 'org.apache.logging.log4j:log4j-core:2.14.1'

// Gain support for YAML with log4j2.
// https://www.sentinelone.com/blog/log4j2-configuration-detailed-guide/
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.10.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'

With that, the dreaded Fatal Error :1:1: Content is not allowed in prolog error went away.

1

The real solution that I found for this issue was by disabling any XML Format post processors. I have added a post processor called "jp@gc - XML Format Post Processor" and started noticing the error "Fatal Error :1:1: Content is not allowed in prolog"

By disabling the post processor had stopped throwing those errors.

anoopknr
  • 3,177
  • 2
  • 23
  • 33
Sastry
  • 11
  • 1
0

Someone should mark Johannes Weiß's comment as the answer to this question. That is exactly why xml documents can't just be loaded in a DOM Document class.

http://en.wikipedia.org/wiki/Byte_order_mark

smiron
  • 408
  • 3
  • 13
0

It could be not supported file encoding. Change it to UTF-8 for example.

I've done this using Sublime

Mike
  • 20,010
  • 25
  • 97
  • 140
-2

Looks like you forgot adding correct headers to your get request (ask the REST API developer or you specific API description):

HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.header("Accept", "application/xml")
connection.setRequestMethod("GET");
connection.connect();

or

connection.header("Accept", "application/xml;version=1")
Daniel Nelson
  • 1,968
  • 1
  • 12
  • 11