36

Assuming I'm starting a project from scratch, which is not dependent on any other project. I would like to use a format to store feeds, something like XML, since XML is not the only available format of its kind, I would like to know: why should I choose one over the rest?

I will be using perl.

'Feed' is a description of a product (name, price, type, short description, up to 120 words).

Jesse Nickles
  • 1,435
  • 1
  • 17
  • 25
snoofkin
  • 8,725
  • 14
  • 49
  • 86
  • 6
    What language are you using? One practical issue is going to be the availability of a good processing library. – Tim Yates Oct 16 '10 at 22:23
  • I will be using perl. 'Feed' is a description of a product (name, price, type, short description, up to 120 words) – snoofkin Oct 16 '10 at 22:36
  • 1
    Where would you be storing the data? In a worker queue? a database? a file? – Ether Oct 16 '10 at 22:53
  • Related: http://stackoverflow.com/questions/1876735/should-i-use-yaml-or-json-to-store-my-perl-data?lq=1 – nawfal Jul 29 '15 at 13:05
  • Related: https://stackoverflow.com/questions/4541570/what-is-the-best-way-to-keep-an-almost-static-data-for-web-application – Jesse Nickles Jul 22 '22 at 16:43

10 Answers10

35

We can't really answer that without knowing a lot more. Just because you're not currently dependent on any other projects, are you likely to interact with them at some point in the future? If so, what technologies do they prefer? At the BBC, we've had some projects "JSON-only", only to find out that Java developers who wanted to access our API were begging us to provide a simple XML API simply because they have so many tool built around XML. They didn't even care about namespaces, attributes, or anything else; they just wanted those angle-brackets.

As for "storing feeds", I also not sure what you mean there. You explain the data in the feed, but what are you then going to do with those feeds? Parse them? Cache and reserve them? Write them out to cuneiform tablets? :)

I sounds like what you actually want is a database and you want to persist the data there and later make it serialisable as JSON/YAML/XML or whatever your desired format is. What I'd recommend is to be able to pull the data out into a Perl data structure and then have "formatters" which know how to serialise that data structure to the desired output. That way you can serialise to, say, JSON, and later if that's not good enough, easily switch to YAML or something else. In fact, if others need your data (one-way data tends not to be useful), they can ask for JSON, YAML, XML or whatever. You have more flexibility and aren't tied into a decision that you made up front.

That being said, I don't know your system, so it's tough to say what the right thing to do is. Also, not that JSON and YAML aren't exactly interchangeable with XML. Subtle differences can and will trip you up.

Ovid
  • 11,580
  • 9
  • 46
  • 76
28

Each will do the job.

JSON has the advantage of super-easy parsing in JavaScript, though you'll probably have to find and introduce a library in other languages.

XML has the advantage that more languages bundle the relevant libraries, and is useful for the storage you mention. So, it is valuable for passing around through different systems, both "in-motion" and "at-rest".

YAML has libraries for all languages, but is somewhat less commonly used, so you are more likely to have to find and introduce a library.

Joshua Fox
  • 18,704
  • 23
  • 87
  • 147
  • You'll have to introduce libraries for either of YAML or JSON in any language, i think. But Perl's YAML libraries are good - YAML came out of the Perl world. – Tom Anderson Oct 19 '10 at 14:32
22

I think XML has been thoroughly explained by the others. However, YAML and JSON are both elegant languages, and they are not as similar as you might believe at first glance.

Some of the particularities about YAML

  • References

    - person: &id002
        name:   James
        age:    5.0
    
    - person: *id001
    

    The second person is an associative array equal to the first.

  • Casting data types

    foobar: !!str 123
    

    foobar is "123" (type string).

  • Uncommon data types not supported by every implementation

    Wikipedia:

    Particularly interesting ones [...] are sets, ordered maps, timestamps, and hexadecimal.

Therefore, I consider JSON a lot simpler.

An argument for JSON

Not just for JavaScript

While it might seem stupid to use the "JavaScript Object Notation" for your application if you don't use JavaScript, you should really consider it anyway, because the data types offered in JSON are probably the most common ones in your language too.

Readable, even if the whitespace is optional

I think JSON is very readable once prettified, which is very easy to do. YAML is difficult to make compact, since it relies on the whitespace. Granted, you should rely on compression for saving bandwidth. The references in YAML might save you a few bytes, but they add a lot of complexity. If you are really dealing with amounts of data that makes it important to avoid duplication, I'd suggest solving that problem on a whole other level. Not even XML supports these kind of macros.

Janus Troelsen
  • 20,267
  • 14
  • 135
  • 196
11

Choose XML if you need to interoperate with systems you don't control (XML Schema is invaluable here), if you will be transforming the data extensively into text, HTML, or XML (haters notwithstanding, XSLT is peerless), if your data includes a lot of text markup, if your data needs to be human-editable (haters notwithstanding, editable XML that's validated against a schema is a pretty good tool for a lot of jobs), and/or if you need to interoperate any of the myriad of tools and technologies that work with XML.

Choose JSON if you really can't be bothered with any of the above.

Choose YAML if you're working in an environment that's got a lot of YAML support.

Robert Rossney
  • 94,622
  • 24
  • 146
  • 218
7

I agree with Joe. For example, if it's a javascript app; json would be a strong candidate. Personally, I'd go with json for just about anything but only because that is the one I'm most comfortable with.

orolo
  • 3,951
  • 2
  • 30
  • 30
  • 1
    agreed - I avoid xml until forced to use it. key balue pairs work just fine for most usages. – Joe Oct 16 '10 at 22:31
7

JSON would be my pick. JSON and YAML are lightweight and easy to get started with (no formal Schema required). JSON is more widely used and more compatible with various other technologies than YAML. For example, PHP has a built-in function to decode or encode JSON, not YAML. JavaScript of course just loves JSON, considering it’s a strict subset of valid JavaScript.

Alan H.
  • 16,219
  • 17
  • 80
  • 113
6

Depends on your needs. For small, lightweight apps I personally think XML is overkill: http://www.codinghorror.com/blog/2008/05/xml-the-angle-bracket-tax.html

I prefer YAML in that case. for interaction with javascript use json. If you truly need to define your own grammar (read: schema) then xml is it. Very powerful, you have to decide what you are trying to do - otherwise your question is too broad to give a definitive answer.

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
Joe
  • 3,337
  • 1
  • 14
  • 11
3

If the data's not hierarchical or going to have data interspersed in e.g., the description This product is great for <targetDemo/> who love it's <featureSet/>), you may want to consider Comma Separated Values (CSV) or some other format like tab separated.

It's old school but it gets the job done without weighing your file down with a bunch of describing text. I.e., in XML, you'd have the following non-value data for each feed.

<feed name="" price="" type="" description=""/>

...contrasted with CSV:

"", , "", ""

If you want, you can add header row at the top for documentation purposes.

There's also plenty of tooling around CSV, from command line utilities like awk to GUIs such as Excel.

Another alternative, if you don't really need the data to be editable via a text editor but don't want to deploy a more robust database service, would be SQLite which allows you to perform RDBMS-style CRUD operations on a flat binary file.

steamer25
  • 9,278
  • 1
  • 33
  • 38
1

In the absence of interoperability concerns, i don't think there's much in it. There are good libraries for all of them in all languages; some of them are built-in, some aren't. Yur interface to those libraries will be narrow - just in data-access code - so if one has a painful API, even that doesn't matter much.

JSON is, for me, the most pleasant to edit by hand, which is a small plus.

YAML can handle non-tree data structures using the &/* notation. Neither XML nor JSON have a built-in way to do that. Your use doesn't need it, though.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
0

I think xml is for big data and json is for small and not too complex data that do not need multiple dimension of array. I might be wrong. ^^ And i only see yaml in google app engine. Which appear to me , it is quite suitable for storing preferences and data of an application.

wizztjh
  • 6,979
  • 6
  • 58
  • 92