40

I have written code to serialize objects to JSON and BSON. According to my output, the BSON produced is greater in size than the JSON. Is this expected?

From my code for Bson.class (using Jackson and bson4jackson)

private ByteArrayOutputStream baos = new ByteArrayOutputStream();
private BsonFactory fac = new BsonFactory();

private ObjectMapper mapper = new ObjectMapper(fac);

public Bson(Object obj) throws JsonGenerationException,
        JsonMappingException, IOException {
    mapper.writeValue(baos, obj);
}

public int size() {
    return baos.size();
}

public String toString() {
    byte[] bytes = baos.toByteArray();
    return new String(bytes);
}

From my Json.class

private ByteArrayOutputStream baos = new ByteArrayOutputStream();
private ObjectMapper mapper = new ObjectMapper();

public Json(Object obj) throws JsonGenerationException,
        JsonMappingException, IOException {
    mapper.writeValue(baos, obj);
}

(size() and toString() as above)

My POJOs are Person.class and Address.class.

In my main class:

    Address a = new Address("Jln Koli", "90121", "Vila", "Belgium");
    Person p = new Person("Ali Bin Baba", new Date(), 90.0, 12, a);

    List<Person> persons = new LinkedList<>();
    persons.add(p);
    persons.add(p);

    Bson bson = new Bson(persons);
    Json json = new Json(persons);
    System.out.println("Bson : " + bson.size() + ", data : " + bson.toString());
    System.out.println("Json : " + json.size() + ", data : " + json.toString());

The ouput:

Bson : 301, data : -
Json : 285, data : [{"name":"Ali Bin Baba","birthd...

My Question:

  1. Is that output true, or is my code wrong?
  2. Any suggestion to check/test, to compare the sizes of BSON and JSON?
ST3
  • 8,826
  • 3
  • 68
  • 92
Auf
  • 473
  • 2
  • 5
  • 9
  • 2
    I would try decoding the data generated. I would assume BSon is faster to decode, not more compact. – Peter Lawrey Jun 09 '14 at 07:01
  • 3
    If you want a dense format and are using Java, definitely go for protocol buffers, they're a lot more compact than either and are much faster though they require a schema. – Benjamin Gruenbaum Jun 09 '14 at 11:51
  • You should have stated your question more generic, e.g. "Is there an efficient Binary representation format for JSON"? Then you probably would have got MessagePack (https://en.wikipedia.org/wiki/MessagePack) as an answer. BSON is actually not quite good, either in encoding speed or message size or JSON compatibility. MessagePack is better in all regards but due to it's name it's not associated right away as a binary JSON format, which is the part in which BSON is doing better. – David Ongaro Jul 11 '18 at 18:34

2 Answers2

66

From the BSON FAQ:

BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length prefixes, that make it easy and fast to traverse.

BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.

For a string field, the overhead in JSON is 6 bytes -- 4 quotes, a colon and a comma. In BSON it's 7 -- entry type byte, null terminator to field name, 4 byte string length, null terminator to value.

For an integer field, the JSON length depends on the size of the number. "1" is just one byte. "1000000" is 7 bytes. In BSON both of these would be a 4 byte 32 bit integer. The situation with floating point numbers is similar.

BSON is not intended to be smaller. It is intended to be closer to the structures that computers work with natively, so that it can be worked with more efficiently -- that is one meaning of "light".

If you're not chasing extreme levels of performance (as the MongoDB developers who designed BSON are), then I would advise using JSON -- the human-readability is a great benefit to the developer. As long as you use a library like Jackson, migrating to BSON later should not be hard -- as you can see by how almost identical your own BSON and JSON classes are.

Bear in mind that if size is an issue, both JSON and BSON should compress well.

slim
  • 40,215
  • 13
  • 94
  • 127
10

The property "foo":"bar" consumes 11 bytes in UTF-8 encoded JSON. In BSON it consumes 13:

bytes       description
============================================
1           entry type value \x02
3           "foo"
1           NUL \x00
4           int32 string length (4 -- includes the NUL)
3           "bar"
1           NUL \x00

There are many cases in which JSON will be more compact.

slim
  • 40,215
  • 13
  • 94
  • 127
McDowell
  • 107,573
  • 31
  • 204
  • 267