39

Are there any self-balancing binary search tree (RED-BLACK, AVL or others) built-in types in Python 2.7 or Python 3.x?

I am looking for something equivalent to Java's TreeMap or TreeSet.

If there are no such built-ins, why have they been ommited? Is there a special reason, for not including such tools?

Alexandru Chirila
  • 2,274
  • 5
  • 29
  • 40
  • 1
    There are none in the standard library. Why one hasn't been included is impossible to answer. The rest of your question is a request for external resources, which is off-topic. – Martijn Pieters Jul 25 '13 at 12:03
  • Near-dupe of http://stackoverflow.com/questions/2298165/pythons-standard-library-is-there-a-module-for-balanced-binary-tree – Fred Foo Jul 25 '13 at 12:17
  • 1
    There's a Heap implementation which is a sort of binary tree. Look for `heapq` – Eduardo Feb 16 '22 at 05:21

2 Answers2

41

There's no special reason, to my knowledge - I'd guess that the reason is that for so many applications the highly-tuned dict and set implementations (which are hash tables) work well. They're good enough in most cases. There are definitely situations where you need the performance characteristics of balanced binary search trees (like ordered traversal based on key- rather than addition-order), but those are far enough off the beaten path that people are happy with grabbing a third-party package in that case.

I've had a good experience using the bintrees package on PyPI. This has implementations of unbalanced, AVL and red-black binary trees, in both pure Python and as extensions written in Cython.

I think the rest of the reason is essentially historical accident. If the person who wrote bintrees lobbied for its inclusion in the stdlib, and was willing to put up with the constraints that imposes on maintenance and releases, it would probably go in. (Although the Cython dependency would cause a problem, I'd guess.)

Algorithmic complexity:

For hash tables (like dicts or sets), insertion and lookup are O(1), while for a balanced tree these are O(log(n)). In-order traversal of keys is O(n) in a tree, but to do the same thing with a hash table you need to sort the keys first, so it's O(n*log(n)). When you're picking which kind of data structure to use, you need to think about which operations you're going to be using, and pick the tradeoff that makes the most sense in your application.

babbageclunk
  • 8,523
  • 1
  • 33
  • 37
  • This makes a lot of sense. I'm guessing that probably `dict` and `set` are actually binary search trees, or maybe something even more efficient. I will make some research in how exactly these are implemented. – Alexandru Chirila Jul 25 '13 at 12:15
  • 7
    @ChirilaAlexandru: `dict` and `set` are hash tables. – Fred Foo Jul 25 '13 at 12:16
  • @wilberforce: I'd give `blist` a better chance of landing int the stdlib. I think it's actually been considered at some point, but I don't recall the details. – Fred Foo Jul 25 '13 at 12:18
  • @larsmans: well aren't binary search trees (search running in O(log n)) more efficient than hash tables (search running in O(n)). Since hashes are numbers, therefore easily comparable, binary search trees seem to be a more logical solution. Am I missing something? – Alexandru Chirila Jul 25 '13 at 12:20
  • 1
    @ChirilaAlexandru: hash table lookups are O(1), rather than O(n) - this is why they're so often what you want. This is sometimes balanced by the fact that choosing a good hash function can be tricky - if you have lots of collisions a lot of the benefit disappears. – babbageclunk Jul 25 '13 at 12:28
  • @wilberforce: I got it now, thanks again for the great answer. – Alexandru Chirila Jul 25 '13 at 12:29
  • 2
    Or rather, hash tables in the worst case can be O(n), but with good hash functions are O(1) in the expected case. They do use more memory than a tree of the same size to help ensure the expected case actually happens, but the extra memory is also used to amortize the cost expanding the data structure as more items are added as well. – chepner Jul 25 '13 at 12:32
  • @larsmans: you might be right, although blist is based on b+ trees, I think - a bit of a different beast. Although much closer to binary trees than hash tables. – babbageclunk Jul 25 '13 at 12:35
  • @wilberforce: yes, but B+ trees can offer the same set of operations that binary trees offer (notably Mapping + sorted iteration), just faster in most cases and using less memory. – Fred Foo Jul 26 '13 at 07:32
  • 3
    i think sorted containers is recommended now https://pypi.org/project/sortedcontainers/ instead of the bintree library – exifguy Dec 18 '19 at 16:26
6

You won't find any trees in the standard library. Python heavily uses dictionary that is hash table for its internal (object, classes and modules are all based on dicts). Therefore dicts has been greatly optimized. This make the needs for search trees much smaller. Also to be efficient such trees would have been implemented in an extension type.

hivert
  • 10,579
  • 3
  • 31
  • 56
  • 1
    How do dicts make the need for search trees smaller? And what is an extension type? – Lucas Mumbo Apr 20 '22 at 22:00
  • 1
    dict = hash table. extension type = builtin type – hivert Apr 22 '22 at 06:33
  • 3
    To my understanding dicts don't replace search trees; they each have their own use cases – Lucas Mumbo Apr 22 '22 at 06:55
  • 1
    One of the main reasons one would want balanced bst instead of hashtable is the possibility to iterate all items that are less or larger than one specified, which is very efficient in case of database indexes. There's no such possibility in a hashtable without extra calculations since it's unordered – toozyfuzzy Jul 26 '22 at 08:17