1

I have an ArrayList of objects being dumped to a YAML string and have been comparing the performance of JYaml and SnakeYaml in handling this.

    ArrayList<HashMap> testList = new ArrayList<HashMap>();
    HashMap<String, String> testMap1 = new HashMap<String, String>();
    HashMap<String, String> testMap2 = new HashMap<String, String>();

    testMap1.put("1_1", "One");
    testMap1.put("1_2", "Two");
    testMap1.put("1_3", "Three");

    testMap2.put("2_1", "One");
    testMap2.put("2_2", "Two");
    testMap2.put("2_3", "Three");

    testList.add(testMap1);
    testList.add(testMap2);

    System.out.println(jYaml.dump(testList));
    System.out.println(snakeYaml.dump(testList));


The output from JYaml includes the serialised object's class name whereas the output from SnakeYaml does not:

JYaml output:

- !java.util.HashMap
  1_1: One
  1_3: Three
  1_2: Two
- !java.util.HashMap
  2_1: One
  2_2: Two
  2_3: Three

SnakeYaml output:

- {'1_1': One, '1_3': Three, '1_2': Two}
- {'2_1': One, '2_2': Two, '2_3': Three}


I prefer the more 'clean' class name-less output of SnakeYaml as this would be more suitable for a language-neutral environment.

I prefer the speed of JYaml. Serialisation/deserialisation times increase linearly with the amount of data being processed, as opposed to exponentially with SnakeYaml.

I'd like to coerce JYaml into giving me class name-less output but am quite lost as to how this can be achieved.

Jon Cram
  • 16,609
  • 24
  • 76
  • 107
  • Why not to create a ticket in SnakeYAML? (http://trac-hg.assembla.com/snakeyaml/report/1) Once the problem with performance is reported and identified it can be fixed. SnakeYAML's output is very flexible. Check http://instantyaml.appspot.com/ (login to see the options) – Andrey Feb 10 '09 at 16:50
  • The problem is identified: regular expressions do not scale in SnakeYAML. JYaml does not use regular expressions to find the proper type at all. (scalar is always a String in JYaml). It is possible to follow the same approach in SnakeYAML. Is it what you expect? – Andrey Feb 19 '09 at 14:14
  • @andrey: Thanks for pointing that out. I'm now more curious: is the use of regular expressions, in the way you explain, required such that /any/ well-written dumper/loader would exhibit the same performance issues? – Jon Cram Feb 19 '09 at 14:23
  • @andrey: "It is possible to follow the same approach in SnakeYAML. Is it what you expect?". Sorry, you lost me a bit there. What is it you're asking? – Jon Cram Feb 19 '09 at 14:24

2 Answers2

-1

How do you measure the speed ? What do you mean 'amount of data' ? Is it a size of a YAML document or an amount of documents ?

JYaml output is incorrect. According to the specification underscores in numbers are ignored and 1_1 = 11 (at least for YAML 1.1). Because it is in fact a String and not an Integer the representation shall be:

  • '1_1': One

or canonically

  • !!str "1_1": !!str "One"

Otherwise when the document is parsed it will create Map<Integer, String> instead of Map<String, String>

JYaml has many open issues and does not implement complete YAML 1.1

JYaml may indeed be faster but it is due to the simplified parsing and emitting.

Andrey
  • 2,931
  • 22
  • 18
-2

Check the SnakeYAML latest source. It is now possible (same as in JYaml) to ignore implicit typing and always parse scalars as Strings. This is a few times faster. Look here and here to see how to use the new feature.

(With the RegularExpressions off serialisation/deserialisation times increase linearly with the amount of data being processed.)

Andrey
  • 2,931
  • 22
  • 18