1

The bson.son.SON is used mainly in pymongo, to get a ordered mapping(dict).

But python already have the ordered dict in collections.OrderedDict()

I have read the docs of bson.son.SON. It did say SON is similar to OrderedDict but did not mention the difference.

So What is the difference? When should we use SON and when should we use OrderedDict?

Kramer Li
  • 2,284
  • 5
  • 27
  • 55
  • by docs `SON provides an API similar to collections.OrderedDict from Python 2.7+.` More details you can find use `dir` and read source. – Brown Bear Jul 16 '18 at 06:55
  • Thanks @BearBrown The docs did not mention the difference. What I want to know is why we have SON when python already have the OrderedDict – Kramer Li Jul 16 '18 at 07:12

1 Answers1

2

Currently, the slight difference in both is that bson.son.SON remains backward compatible with Python 2.7 and older versions. Also the argument that SON serializes faster than OrderedDict is no longer correct in 2018.

The son module was added in Jan 8, 2009.

collections.OrderedDict(PEP-372) was added in python in Mar 2, 2009.

While the differences in dates doesn't tell which was released first, it is interesting to see that the Mongodb already implemented an ordered map for their use case. I guess that they may have decided to keep maintaining it for backward compatibility instead of switching all SON references in their codebase to collections.OrderedDict

In small experiments with both, you easily see that collections.OrderedDict performs better than bson.son.SON.

In [1]:    from bson.son import SON
           from collections import OrderedDict
           import copy

           print(set(dir(SON)) - set(dir(OrderedDict)))

{'weakref', 'iteritems', 'iterkeys', 'itervalues', 'module', 'has_key', 'deepcopy', 'to_dict'}

In [2]:    %timeit s = SON([('a',2)]); z = copy.deepcopy(s)

14.3 µs ± 758 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]:    %timeit s = OrderedDict([('a',2)]); z = copy.deepcopy(s)

7.54 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [4]:    %timeit s = SON(data=[('a',2)]); z = json.dumps(s)

11.5 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]:    %timeit s = OrderedDict([('a',2)]); z = json.dumps(s)

5.35 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In answer to your question about when to use SON, use SON if running your software in versions of Python older than 2.7.

If you can help it, use OrderedDict from the collections module. You can also use dict, they are ordered now in Python 3.7

Oluwafemi Sule
  • 36,144
  • 1
  • 56
  • 81