167

By which I mean a structure with:

  • O(log n) complexity for x.push() operations
  • O(log n) complexity to find an element
  • O(n) complexity to compute list(x) which will be sorted

I also had a related question about performance of list(...).insert(...) which is now here.

Community
  • 1
  • 1
ilya n.
  • 18,398
  • 15
  • 71
  • 89
  • `memcpy` is still a *O(n)* operation. I am not sure how Python implements lists *exactly*, but my bet would be that they are stored in contiguous memory (certainly not as a linked list). If that is indeed so, the insertion using `bisect` which you demonstrate will have complexity *O(n)*. – Stephan202 Jul 10 '09 at 15:38
  • 3
    Sadly not out the box. But Grant Jenk's [sortedcontainers](http://www.grantjenks.com/docs/sortedcontainers/) library is excellent. http://stackoverflow.com/a/22616929/284795 – Colonel Panic Jul 28 '15 at 15:31

9 Answers9

112

Is there a particular reason for your big-O requirements? Or do you just want it to be fast? The sortedcontainers module is pure-Python and fast (as in fast-as-C implementations like blist and rbtree).

The performance comparison shows it benchmarks faster or on par with blist's sorted list type. Note also that rbtree, RBTree, and PyAVL provide sorted dict and set types but don't have a sorted list type.

If performance is a requirement, always remember to benchmark. A module that substantiates the claim of being fast with Big-O notation should be suspect until it also shows benchmark comparisons.

Disclaimer: I am the author of the Python sortedcontainers module.


Installation:

pip install sortedcontainers

Usage:

>>> from sortedcontainers import SortedList
>>> l = SortedList()
>>> l.update([0, 4, 1, 3, 2])
>>> l.index(3)
3
>>> l.add(5)
>>> l[-1]
5
Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • 7
    Indeed I compared sortedcontainers against bisect: `0.0845024989976` for SortedList.add() vs `0.596589182518` for bisect.insort(), thus a difference of 7x in speed! And I expect the speed gap to increase with the list length since sortedcontainers insertion sort works in O(log n) while bisect.insort() in O(n). – gaborous Jul 19 '15 at 14:13
  • 1
    @gaborous because bisect still uses a list, so the insertion remains `O(n)` – njzk2 May 18 '17 at 19:03
  • Just a 5 cents: If one needs to sort objects from custom class, use SortedList's subclass, called SortedKeyList. – Andrey Dec 08 '22 at 04:12
  • Presumably, the reason for the big-O requirements is specifically that they imply either a balanced binary tree representation or equivalent. – Karl Knechtel Feb 10 '23 at 16:59
67

The standard Python list is not sorted in any form. The standard heapq module can be used to append in O(log n) to an existing list and remove the smallest one in O(log n), but isn't a sorted list in your definition.

There are various implementations of balanced trees for Python that meet your requirements, e.g. rbtree, RBTree, or pyavl.

Arne
  • 17,706
  • 5
  • 83
  • 99
Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
  • 2
    +1 for rbtree, it works very well (but contains native code; not pure python, not so easy to deploy perhaps) – Will May 13 '11 at 12:34
  • 13
    [sortedcontainers](http://www.grantjenks.com/docs/sortedcontainers/) is pure-Python and fast-as-C (like rbtree) with a performance comparison. – GrantJ Mar 27 '14 at 17:32
  • "isn't a sorted list in your definition." How so? – Colonel Panic Jul 28 '15 at 15:36
  • 6
    heapq only allows to find the smallest element; the OP was asking for a structure that can find any element in O(log n), which heaps are not. – Martin v. Löwis Aug 07 '15 at 17:27
39

Though I have still never checked the "big O" speeds of basic Python list operations, the bisect standard module is probably also worth mentioning in this context:

import bisect
L = [0, 100]

bisect.insort(L, 50)
bisect.insort(L, 20)
bisect.insort(L, 21)

print L
## [0, 20, 21, 50, 100]

i = bisect.bisect(L, 20)
print L[i-1], L[i]
## 20, 21

PS. Ah, sorry, bisect is mentioned in the referenced question. Still, I think it won't be much harm if this information will be here )

PPS. And CPython lists are actually arrays (not, say, skiplists or etc) . Well, I guess they have to be something simple, but as for me, the name is a little bit misleading.


So, if I am not mistaken, the bisect/list speeds would probably be:

  • for a push(): O(n) for the worst case ;
  • for a search: if we consider the speed of array indexing to be O(1), search should be an O(log(n)) operation ;
  • for the list creation: O(n) should be the speed of the list copying, otherwise it's O(1) for the same list )

Upd. Following a discussion in the comments, let me link here these SO questions: How is Python's List Implemented and What is the runtime complexity of python list functions

Community
  • 1
  • 1
ジョージ
  • 1,476
  • 1
  • 22
  • 29
  • push() should be in O(log n) since the list is already sorted. – estani Apr 11 '12 at 12:05
  • 1
    may be I should have said ["for an insert op"](http://docs.python.org/library/bisect.html#bisect.insort_left). anyway, that was about a year ago so now I can easily mix things up or miss something – ジョージ Apr 13 '12 at 04:53
  • You can always insert a value into a sorted list in O(log n), see binary search. push() is defined as an insert operation. – estani Apr 16 '12 at 16:29
  • 3
    True. But while _finding_ the insert location would indeed take O(log n) ops, the actual insert (i.e. adding the element to the data structure) probably depends on that structure (think inserting an element in a sorted array). And as [Python lists are actually arrays](http://www.laurentluce.com/posts/python-list-implementation/), this may take O(n). Due to the size limit for the comments, I will link two related SO questions from the text of the answer (see above). – ジョージ Apr 22 '12 at 00:35
  • Good argument. I wasn't aware list where handled as arrays in Python. – estani Apr 23 '12 at 12:57
  • It is a bit of a misnomer indeed. – ジョージ Apr 24 '12 at 09:05
  • This doesn't answer the question. It's about an algorithm rather than a data structure, and it just describes maintaining an ordinary list in sorted order - in a way that doesn't meet the performance requirements that would be met by a balanced binary tree or similar. (Of course, it is O(1) rather than O(lg N) for *indexing*; there is still contiguous storage, whereas the tree-based approach needs to do work comparable to a search in order to index.) – Karl Knechtel Feb 10 '23 at 17:03
5

Though it does not (yet) provide a custom search function, the heapq module may suit your needs. It implements a heap queue using a regular list. You'd have to write your own efficient membership test that makes use of the queue's internal structure (that can be done in O(log n), I'd say...). There is one downside: extracting a sorted list has complexity O(n log n).

Stephan202
  • 59,965
  • 13
  • 127
  • 133
  • It's nice but hard to bisect. – ilya n. Jul 10 '09 at 15:46
  • 3
    How can there be an O(log n) membership test in a heap? If you are looking for value x, you can stop looking down a branch if you find something larger than x, but for a random value of x it is 50% likely to be at a leaf, and you probably can't prune much. – markets Nov 12 '10 at 15:06
5
import bisect

class sortedlist(list):
    '''just a list but with an insort (insert into sorted position)'''
    def insort(self, x):
        bisect.insort(self, x)
MSeifert
  • 145,886
  • 38
  • 333
  • 352
Dave31415
  • 2,846
  • 4
  • 26
  • 34
1

An AVL Tree coupled with in-order traversal will solve this problem in the required time complexity.

FRob
  • 3,883
  • 2
  • 27
  • 40
0

It may not be hard to implement your own sortlist on Python. Below is a proof of concept:

import bisect

class sortlist:
    def __init__(self, list):
        self.list = list
        self.sort()
    def sort(self):
        l = []
        for i in range(len(self.list)):
            bisect.insort(l, self.list[i])
        self.list = l
        self.len = i
    def insert(self, value):
        bisect.insort(self.list, value)
        self.len += 1
    def show(self):
        print self.list
    def search(self,value):
        left = bisect.bisect_left(self.list, value)
        if abs(self.list[min([left,self.len-1])] - value) >= abs(self.list[left-1] - value):
            return self.list[left-1]
        else:
            return self.list[left]

list = [101, 3, 10, 14, 23, 86, 44, 45, 45, 50, 66, 95, 17, 77, 79, 84, 85, 91, 73]
slist = sortlist(list)
slist.show()
slist.insert(99)
slist.show()
print slist.search(100000000)
print slist.search(0)
print slist.search(56.7)

========= Results ============

[3, 10, 14, 17, 23, 44, 45, 45, 50, 66, 73, 77, 79, 84, 85, 86, 91, 95, 101]

[3, 10, 14, 17, 23, 44, 45, 45, 50, 66, 73, 77, 79, 84, 85, 86, 91, 95, 99, 101]

101

3

50

Fan
  • 97
  • 1
  • 3
0

Use the standard library bisect module or the third-party sortedcontainers package. Alternately, the heapq module solves the problem by implementing a heap queue.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Slass33
  • 66
  • 9
-1

Interesting case: if your list L is already sorted (for example because you appended them in a sorted order), you can benefit from a fast lookup in O(log n) with a standard Python list with this method:

import bisect
def in_sorted_list(elem, sorted_list):
    i = bisect.bisect_left(sorted_list, elem)
    return i != len(sorted_list) and sorted_list[i] == elem
L = ["aaa", "bcd", "hello", "world", "zzz"]
print(in_sorted_list("hellu", L))       # False

More details in this answer.

Basj
  • 41,386
  • 99
  • 383
  • 673