0

Most XML documents, like RSS feeds, starts with a prologue:

<?xml version="1.0" encoding="UTF-8" ?>

But what i can't understand is why this is needed, because if application parses XML and reads "encoding" value, it is already reading text, decoded with application's encoding.

Croll
  • 3,631
  • 6
  • 30
  • 63
  • possible duplicate of [What use is the 'encoding' in the XML header?](http://stackoverflow.com/questions/5165347/what-use-is-the-encoding-in-the-xml-header) – Brett Okken Jan 01 '15 at 17:22

1 Answers1

-1

because if application parses XML and reads "encoding" value, it is already reading text,

That's not necessarily true. The XML parser will read the bytes up until the first new line (which is the reason why the xml declaration must always be on the first line of a xml file), convert it to text in order to parse the encoding and then read the remaining bytes using the specified encoding.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • Sweet. Any XML parser implementations where i can see this behavior? i mean open source. Is there a specification for first line's encoding? – Croll Jan 01 '15 at 17:19
  • The answer to the duplicate question has link to the xml standard which describes how this is to be done: http://stackoverflow.com/a/5165423/676877 – Brett Okken Jan 01 '15 at 17:23
  • This answer is not really accurate. The new line has nothing to do with it. If the encoding is not utf-8 or utf-16, a prolog is required. The prolog can be identified in the first 2 or 4 bytes of data. Basically the goal is the brute force the reading of ` – Brett Okken Jan 01 '15 at 19:56
  • @DmitrijA did you read the linked xml spec which describes how the prolog is used to "guess" the encoding? What are you still finding confusing? – Brett Okken Jan 01 '15 at 21:45