In other languages, there are built-in data structures with binary search tree semantics, for example C++'s std::map
.
Note that I am using C++ as an example, but not because I am attempting to draw a direct comparison between C++ itself and Python. I have received close votes on this question as "opinion based", but there is nothing "opinion based" about this question.
By binary search tree semantics I mean the following:
- I can insert elements into the structure in sub-linear time, i.e. O(logn) for std::map
- I can iterate over all of the elements in sorted order in linear time, i.e. O(n) for std::map
Is there any data structure in Python with binary search tree semantics?
To illustrate, can the following program be written efficiently with built-in Python data structures?
Imagine that I am a fund manager at a bank. Bank accounts are added to my fund very frequently. A bank account has an ID and a value. A smaller ID means that the bank account was created earlier than an account with a larger ID. Each time that an account is added to my fund, I would like to know the ID number of the 1000th oldest account in my fund, for my records.
Here are implementations in Python 3 and C++:
from random import randint
dist = lambda: randint(0, 2147483647)
def main():
my_map = {}
# fills a map with 1000000 random mappings of type (<int>: <int>)
for i in range(0, 1000000):
my_map[dist()] = dist()
# prints out the 1000th smallest key and its value
target_key = nth_smallest_key(1000, my_map)
print("({}: {})".format(target_key, my_map[target_key]))
# writes a new random mapping to the map
# then prints out the 1000th smallest key and its value if that key
# has changed
# 100000 times
for i in range(100000):
my_map[dist()] = dist()
test_key = nth_smallest_key(1000, my_map)
if target_key != test_key:
target_key = test_key
print("({}: {})".format(target_key, my_map[target_key]))
# print an indicator every 100 iterations for comparison
if i % 100 == 0:
print("iteration: {}".format(i))
def nth_smallest_key(n, m):
return sorted(m.keys())[n]
if __name__ == "__main__":
main()
#include <cstdio>
#include <map>
#include <random>
using namespace std;
int nth_smallest_key(int n, map<int, int>& m);
int main() {
random_device rd;
mt19937 psrng(rd());
uniform_int_distribution<> dist(0, 2147483647);
map<int, int> my_map;
// fills a map with 1000000 random mappings of type (<int>: <int>)
for (int i = 0; i < 1000000; i++) {
my_map[dist(psrng)] = dist(psrng);
}
// prints out the 1000th smallest key and its value
int target_key = nth_smallest_key(1000, my_map);
printf("(%d: %d)\n", target_key, my_map[target_key]);
// writes a new random mapping to the map
// then prints out the 1000th smallest key and its value if that key
// has changed
// 100000 times
for (int i = 0; i <= 100000; i++) {
my_map[dist(psrng)] = dist(psrng);
int test_key = nth_smallest_key(1000, my_map);
if (target_key != test_key) {
target_key = test_key;
printf("(%d: %d)\n", target_key, my_map[target_key]);
}
}
return 0;
}
int nth_smallest_key(int n, map<int, int>& m) {
map<int, int>::iterator curr = m.begin();
for (int i = 0; i < n; i++) {
curr = next(curr);
}
return curr->first;
}
Makefile:
buildcpp:
g++ -o main bst_semantics_cpp.cpp
runcpp: buildcpp
./main
runpython:
python3 bst_semantics_python.py
In C++, this program runs on my machine in approximately 5 seconds
$ time ./main
(2211193: 2021141747)
(2208771: 1079444337)
(2208700: 1187119077)
(2208378: 1447743503)
...
(1996019: 1378401665)
(1989217: 1457497754)
(1989042: 1336229409)
real 0m4.915s
user 0m4.750s
sys 0m0.094s
$
However, in Python, the program has not reached 1% of completion after 120 seconds
$ time make runpython
python3 bst_semantics_python.py
(2158070: 1498305903)
iteration: 0
iteration: 100
iteration: 200
^CTraceback (most recent call last):
File "bst_semantics_python.py", line 36, in <module>
main()
File "bst_semantics_python.py", line 23, in main
test_key = nth_smallest_key(1000, my_map)
File "bst_semantics_python.py", line 33, in nth_smallest_key
return sorted(m.keys())[n]
KeyboardInterrupt
Makefile:8: recipe for target 'runpython' failed
make: *** [runpython] Error 1
real 2m2.040s
user 1m59.063s
sys 0m0.375s
$
To write this program efficiently, you need a data structure with binary search tree semantics.
Is there such a data structure in Python? Can you write this program efficiently with built-in Python data structures?