21

I have an online shop where vendors can upload and import there articles in two formats.

  1. plain text (tab delimted)
  2. XML

Currently I'm using XML 1.0.

However I see there is also a version 1.1

At wikipedia it is stated that for most uses 1.0 will be OK to use. http://en.wikipedia.org/wiki/XML#Versions

It also states it uses the following Unicode encoding: Unicode 2.0 to Unicode 3.2.

In the fifth edition, XML names may contain characters in the Balinese, Cham, or Phoenician scripts among many others which have been added to Unicode since Unicode 3.2

Currently I only have a couple of 'latin' based languages but this may change in the future and I want to be prepared.

Are there any characters in Unicode 3.2 not supported for some languages? Is v1.0 safe to use for me?

If you need more info just let me know.

Abel
  • 56,041
  • 24
  • 146
  • 247
PeeHaa
  • 71,436
  • 58
  • 190
  • 262

3 Answers3

26

Use version 1.0.

You would only need to use version 1.1 if you are using certain non-ASCII characters in identifiers, EBCDIC line ending characters, or control characters (character codes 1 - 31).

Rationale and list of changes for XML 1.1

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • 3
    I understand the benefits are minor, but what are the disadvantages to specifying ``? will many things suddenly not work? – ycomp Oct 23 '15 at 17:34
  • 1
    @ycomp: Yes, it can very well stop working. The support for XML 1.1 is not widely implemented. The .NET framework for example [won't read it](http://stackoverflow.com/questions/17231675/does-net-4-5-support-xml-1-1-yet-for-characters-invalid-in-xml-1-0). – Guffa Oct 23 '15 at 17:43
  • I don't understand why they had to change the version number suddenly. The first (XML 1.0) was initially defined in 1998. It has undergone several revisions since then, without being given a new version number. – NoName Nov 03 '16 at 13:12
  • @NoName "A new XML version, rather than a set of errata to XML 1.0, is being created because the changes affect the definition of well-formed documents." see the quoted rationale in the answer – Felix D. Jan 16 '18 at 21:41
11

XML 1.1 came out of a fanatical desire to be "inclusive" by supporting all the world's languages, including methods of writing Abyssinian that were only used for 15 years nearly a century ago. If you are one of the 99.99999% of the population who doesn't need to capture ancient manuscripts, XML 1.1 is a total waste of time.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • 5
    I think I fall in the category of the 99,99999 :) – PeeHaa Jul 31 '11 at 14:11
  • 12
    Note that this also only applies to identifiers. You can still use those characters in content. – Guffa Aug 01 '11 at 06:27
  • 10
    [Only 713 people need it?](http://wolframalpha.com/input/?i=%281-0.9999999%29%2Aworld+population) – Jimmy T. Aug 04 '15 at 08:36
  • I suspect the number of people who need to use obsolete Abyssinian characters in the names of elements and attributes is a lot lower than 713. – Michael Kay Aug 05 '15 at 07:56
  • There are some constructions, like conditional elements and assertions, that don't exist in 1.0. So, for many, it isn't a total waste of time. – Suncat2000 Oct 25 '18 at 18:13
  • 1
    @Suncat2000 you are confusing XSD 1.1 with XML 1.1. This thread is about XML versions, not XSD versions. – Michael Kay Oct 26 '18 at 08:21
  • It's not so fanatical. 1.0 didn't allow many control characters in content, not even if you escaped them, in your xml file means it's not "well formed". Well thank you very much. I need to talk about that character. Does that make me a "fanatic"? There is no need to exclude characters from content just because most users don't need them. – CHKingsley Dec 05 '20 at 19:52
  • If you need to use x01 then you probably also need to use x00, which XML 1.1 doesn't allow; so you need to find a way of talking about control characters that doesn't involve putting them literally in your content. – Michael Kay Dec 06 '20 at 09:31
  • Can you please elaborate on a workaround so that users could use characters normally not "allowed" in their xml content?? (perhaps even update your answer) @MichaelKay I was thinking you use html or other text format outside the xml structure. – Jon Grah Mar 04 '22 at 04:39
  • @JonGrah Asking supplementary questions in comments isn't a good idea, especially if it's 10 years since the original post. Please raise a new question. – Michael Kay Mar 04 '22 at 07:54
9

Beyond non-useful things (like silly EBCDIC linefeeds), there is unfortunately one nice feature that XML 1.1 allows: ability to use character entities for Unicode/ASCII control characters other than LF/CR/Tab. Except that you still can not include nulls, even using character references.

So this is hardly enough to make one use 1.1, unless there is specific need to contain these characters.

StaxMan
  • 113,358
  • 34
  • 211
  • 239