XML, granted, is very useful, but can be quite verbose. What alternatives are there and are they specialised for any particular purpose? Library support to interrogate the contents easily is a big plus point.
24 Answers
Jeff's article on The Angle Bracket Tax summarizes a number of alternatives (well, mainly YAML), and led me to the wiki article on lightweight markup languages.
Update: Although YAML is a possible "alternative to XML" for some applications, the two are not, as I first thought, isomorphic.
Indeed, it "ain't markup language."
Furthermore, YAML ain't as "lightweight" as it appears. For documents that can be represented in plain XML (such as Jeff's example), YAML is clearly less verbose. But YAML offers many other specialized structures, enlisting many more characters and sequences than are reserved by XML.
Bottom line, if you're looking for XML-without-angle-brackets, YAML ain't it.

- 41,820
- 13
- 96
- 131
Don't forget about YAML!
JSON seems to have better support though. For example, the Prototype JS library has excellent built-in JSON functions.

- 146,731
- 54
- 156
- 201
My work with XML is almost exclusively with document-centric XML, which must model long sequences of arbitrarily nested structures. I haven't used JSON yet, but my impression is that it is cumbersome to use with document-like data, but well-adapted and even elegant for use with record-like data. Consider the shape of your data when making your decision.

- 878
- 5
- 12
You could try google's protobufs. It's much faster than XML. There are libraries for it in C, C++, C#, Java and Python (there are alpha versons for ruby and perl). But it is binary.

- 4,095
- 2
- 22
- 12
-
1Protobuf is a nice streaming format for data transfers, but not a data format per se - it has zero long term storage viability due to coding fields by numbers (from a schema). That is bound to cause problems. Protbuf should only be used when persistence is irrelevant (i.e. transfer over the network) so data is never stored long term. Which is what it was developped for. – TomTom Dec 15 '14 at 15:19
-
@TomTom: No, protobufs are perfectly fine for permanent storage, and indeed [were designed to be](https://developers.google.com/protocol-buffers/docs/overview). Google uses them for nearly everything. There are [guidelines](https://developers.google.com/protocol-buffers/docs/proto#updating) you need to follow when updating the schema, of course. – Thomas Jun 16 '16 at 07:59
HDF5 is a very compact data format with some characteristics that are similar to xml. The .net libraries leave a lot to be desired, but the format scales very well both in terms of size and performance.

- 15,459
- 7
- 44
- 62
-
2+1 for a small, fast, cross-platform **binary** format. Why are all these flabby, slow, text formats so popular? (Please, everyone, don't claim XML is human-readable.) – MarkJ Oct 14 '09 at 12:22
-
Because flabby text formats tend to be stable when the data evolves where binary formats get manggled when field identification numbers change. They rae great for their own use, but not a data format. – TomTom Dec 15 '14 at 15:20
TOML is the new big thing. It has the niceness of YAML without the big spec. It extends a common and familiar configuration file format. It is directly analogous to (and translatable to) JSON. Has support in all the big languages. Created by Github co-founder/president Tom and narcissistically named. Its awesome. Give it a shot!
Sample TOML:
# This is a TOML document. Boom.
title = "TOML Example"
[owner]
name = "Tom Preston-Werner"
organization = "GitHub"
bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
dob = 1979-05-27T07:32:00Z # First class dates? Why not?
[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true
[servers]
# You can indent as you please. Tabs or spaces. TOML don't care.
[servers.alpha]
ip = "10.0.0.1"
dc = "eqdc10"
[servers.beta]
ip = "10.0.0.2"
dc = "eqdc10"
[clients]
data = [ ["gamma", "delta"], [1, 2] ]
# Line breaks are OK when inside arrays
hosts = [
"alpha",
"omega"
]

- 40,605
- 21
- 89
- 122
-
I'm biased since I'm in the process of writing a TOML parsing library, but TOML strikes a nice balance between the ease and simplicity of JSON without the ambiguity around number types (JavaScript assumes double precision IEEE floating point numbers, but there's no standard) and the richness of YAML without the 80 page spec, with good human readability and without the verbosity of XML. – Joels Elf Aug 29 '16 at 16:07
If someone looking up less verbose alternative to XML, which is more or less isomorphic to XML, then there is AXON. In order to explain consider examples of equivalent representations in both XML and AXON. There is also python library pyaxon that support AXON format.
XML
<person>
<name>Alex</name>
<age>34</age>
<email>mail@example.com</email>
</person>
AXON
person {
name {"Alex"}
age {34}
email {"mail@example.com"}}
XML
<memo date="2008-02-14">
<from>
<name>The Whole World</name><email>us@world.org</email>
</from>
<to>
<name>Dawg</name><email>dawg158@aol.com</email>
</to>
<message>
Dear sir, you won the internet. http://is.gd/fh0
</message>
</memo>
AXON
memo {
date:2008-02-14
from {
name{"The Whole World"} email{"us@world.org"}}
to {
name{"Dawg"} email{"dawg158@aol.com"}}
message {"Dear sir, you won the internet. http://is.gd/fh0"}
}
XML
<club>
<players>
<player id="kramnik"
name="Vladimir Kramnik"
rating="2700"
status="GM" />
<player id="fritz"
name="Deep Fritz"
rating="2700"
status="Computer" />
<player id="mertz"
name="David Mertz"
rating="1400"
status="Amateur" />
</players>
<matches>
<match>
<Date>2002-10-04</Date>
<White refid="fritz" />
<Black refid="kramnik" />
<Result>Draw</Result>
</match>
<match>
<Date>2002-10-06</Date>
<White refid="kramnik" />
<Black refid="fritz" />
<Result>White</Result>
</match>
</matches>
</club>
AXON
club {
players {
player {
id:"kramnik"
name:"Vladimir Kramnik"
rating:2700
status:"GM"}
player {
id:"fritz"
name:"Deep Fritz"
rating:2700
status:"Computer"}
player {
id:"mertz"
name:"David Mertz"
rating:1400
status:"Amateur"}}
matches {
match {
Date{2002-10-04}
White{refid:"fritz"}
Black{refid:"kramnik"}
Result{"Draw"}}
match {
Date{2002-10-06}
White{refid:"kramnik"}
Black{refid:"fritz"}
Result{"White"}}}}

- 2,396
- 1
- 12
- 13
S-Expressions work great if you don't need to apply attributes to elements. Another alternative is YAML.
XML is often used for configuration, and in this case there are some other simple storage formats that are often used (less document oriented):
There's various ways for reading and writing both, depending on platform and language.

- 11,558
- 4
- 45
- 67
What do you want to do with the data? Store it? Pass it around? Display it? These questions should drive your search for an appropriate technology. Simply asking how you should format your data is like asking what language you should program in, without specifying what you want to accomplish.
For most data tasks, well Dr. Codd has the cure: http://en.wikipedia.org/wiki/Edgar_F._Codd. Databases should be able to do just about anything you have in mind.
If you're passing it around, I advocate plain text. When you roll your own binary format your data goes away when your parser goes away.
With plain text, the deeper question is where to put the metadata. Should it be external to the data file, or internal ("self-describing").
For example, XML is plain text, but so is source code. With a source file, there is a specification that goes in to great detail as to the syntax and semantics, while XML is supposed to be self-describing. The problem is that it isn't. Furthermore it evolved right out of document presentation and markup, but is now being abused for all sorts of data serialization, transfer, and storage.

- 23,435
- 23
- 108
- 157
-
No, I like the open-endedness of this question. It gives coders a general overview of choices for any situation. And I, for one, do want a single programming language that is suited for anything I want to accomplish (someday it will exist. Someday.) – Qwertie Dec 03 '13 at 22:47
Simple Declarative Language is a nice alternative to XML for common tasks such as serialization and configuration. It provides a C# and Java parser library. I think it excels at specifying all kinds of metadata without the XML verbosity.

- 3,687
- 3
- 24
- 29
But at what cost?
I'm all for JSON in many situations, especially where weight or client-side work is a concern, but moving away from XML loses readability (so important in those config files) and the power of tomorrow's problem solutions like XSLT and XPath. Be really sure why and when you move away: it's a de facto standard for a reason.
(aside: my habit is to use XML internally, and transform that to JSON where that's the desired output)

- 74,572
- 17
- 113
- 180
Heresy! XML is king of data. Say no to the usurpers, off with their heads! Long live XML!
But seriously if just need data use Json, for support and elegance, but if you need formating ,xpath like queries, additional metadata, etc... Stick with XML
Note: I use Xml for configs system building code generation and similar tasks, but Json for Rpc,Sql for queries and persistency, and finally Yaml here and there for logging and quick tasks, in other words choose the appriopiate format for the need.

- 68,773
- 61
- 187
- 272
I wouldn't dismiss plain text, like CSV or tab-delimited.
I'm really looking for alternatives that have a defined structure and (cross platform, multi language) library support. I'm interested in looking at different designs and their pros and cons. I like the idea of formats that can have a text and "binary" (compact, "compiled", fast I/O, smaller footprint) format. The advantage of having libraries is that they perform the parsing and perhaps extra data manipulation/validation for you.
Although having said that, there is definitely a use for simple formats like .ini, .plist and CSV etc. You shouldn't always have to use a hammer to crack a nut.

- 27,566
- 12
- 60
- 72
JSON can be used in many ways, but it is particularly well suited to use with MySQL tables I find. It works very well with Android as well (GSON library or JSON). Beyond that, it's effective at transmitting small bits of data individually or as arrays.

- 2,804
- 4
- 26
- 55
For storing code-like data, LES (Loyc Expression Syntax) is a budding alternative. I've noticed a lot of people use XML for code-like constructs, such as build systems which support conditionals, command invocations, sometimes even loops. These sorts of things look natural in LES:
// LES code has no built-in meaning. This just shows what it looks like.
[DelayedWrite] // an "attribute"
Output(
if version > 4.0 {
$ProjectDir/Src/Foo;
} else {
$ProjectDir/Foo;
}
);
It doesn't have great tool support yet, though; currently the only LES library is for C#. Currently only one app is known to use LES: LLLPG.
In theory you could use LES for data or markup, but there are no standards for how to do that:
body {
'''Click here to use the World's '''
a href="http://google.com" {
strong "most popular"; " search engine!"
};
};
point = (2, -3);
tasteMap = { "lemon" -> sour; "sugar" -> sweet; "grape" -> yummy };

- 16,354
- 20
- 105
- 148
For the sake of mentioning... have a look at my proposal:
It is very simple and is not overloadad with variety of special symbols, just {} and "" basically.
Supports C++ style comments.
There are C++, C# and Java libraries.
Example:
"String object"
AnotherStringObject
"String with children"{
"child 1"
Child2
"child three"{
SubChild1
"Subchild two"
Property1 {Value1}
"Property two" {"Value 2"}
//comment
/* multi-line
comment */
"multi-line
string"
"Escape sequences \" \n \r \t \\"
}
}

- 1,959
- 1
- 17
- 27
XML is OK for text markup, but for general structures serialization is a quite bad option, where JSON is much more suited.

- 2,327
- 5
- 26
- 43
If you're asking in the perspective of a DSL, Guile Scheme could help, as already suggested with the S-expressions.
Personally I also use JSON for AJAX transactions.