13

I have tried some quick experiment comparing the performance of native Python lists with linked lists implementations such as this.

The native python lists are always faster than the non native linked list in the cases where they should not be (according the theory).

from linkedlist import *
import time
a = LinkedList()
b = []
for i in range(1000000):
    b.append(i)
    a.add_first(i)
t0 = time.clock()
a.remove(10)
t1 = time.clock()
b.remove(10)
t2 = time.clock()
print t1-t0
print t2-t1

The results I have on the test above are:

  • native linked list = 2.00000000001e-05

  • python list = 0.005576

  • non native linked list = 3.90000000001e-05

So, I was wondering why Python don't have a native Linked List data structure. In the case of Python, it looks to me that it could be useful algorithmically speaking to have Linked List instead of the standard Lists to speed up some aspects of the standard library.

My understanding is that the List data structure is a key building block of the language and that it makes the code more maintainable and easily optimizable to focus on that very data structure.

Is there any other reason?

Community
  • 1
  • 1
lc2817
  • 3,722
  • 16
  • 40
  • 1
    I see two `print`s in your test and three results - where is your "native" linked list coming from? – Eric Dec 12 '12 at 08:03
  • 1
    I have ran the tests several times with different implementations, Also I have built this quick and super dirty code with swig http://cl.ly/code/2A3t352q1m1Y – lc2817 Dec 12 '12 at 08:04
  • 1
    Are you asking "Why the developers decided to exclude linked list DS from python?" p.s. I think the question is a bit subjective to fit in SO, maybe Programmers.SE? – amit Dec 12 '12 at 08:04
  • You are right, but would it be somehow useful for the standard library to have an implementation of native linked list in Python in order to potentially speedup any existing apps dealing with a high volume of data. – lc2817 Dec 12 '12 at 08:07
  • http://stackoverflow.com/questions/280243/python-linked-list – alexvassel Dec 12 '12 at 08:17
  • 1
    In your loop you are appending to `b` but prepending to `a` -- is that intentional? – Russell Davis Dec 12 '12 at 08:23
  • That's a good point. No it is not intentional, the point is just to build the two lists and remove the 11th element. My remove method removes the 11th element and the remove method of the list remove the first occurrence of 10 so, in its case, the 11th element, so it comes down to the same thing even if I have to admit that my test is not perfect. – lc2817 Dec 12 '12 at 08:25
  • You didn't post your non-native linked list code, but I'm guessing it might be searching for the value 10 (rather than removing the item at index 10) which would explain why it's taking so long. – Russell Davis Dec 12 '12 at 08:38
  • You are right on that one, I will correct the time. – lc2817 Dec 12 '12 at 08:46
  • 1
    a.remove(10) needs to shift 999_990 items. b.remove(10) needs to scan 10 items. You dont compare comparable things. Use 500_000 instead of 10 to be objective. I guess python have no linked list because it's not efficient as a data structure (not aligned), and have no indexing abilities, so a poorer interface. – B. M. Mar 01 '19 at 21:15

2 Answers2

1

Indeed, Python doesn't have a native linked list implementation, and I would also love to know why.

One alternative you may want to consider is collections.deque (= "double-ended queue") which provides very fast constant-time O(1) insertion at both ends. In fact, for your specific example, a deque is a better choice than a linked list, since you don't need insertion in the middle.

However, in general, it is important to keep in mind that a deque is a different data structure than a linked list. A linked list also provides constant-time O(1) insertion in the middle, while a deque only provides linear-time O(n) insertion in the middle. In other words, the time it takes to insert an element in the middle of a deque is proportional to the current length n of the deque, and for a sufficiently large n, it will be slower than with a linked list.

Comparison between collections.deque and a true linked list

Internally, collections.deque is in fact implemented using a linked list. However, this is an implementation detail, and more importantly, it is implemented as a linked list of fixed-size blocks of elements, not as a linked list of individual elements.

This is why insertion in the middle of a collections.deque is O(n) rather than O(1): you still need to modify around half the memory of the whole deque to accommodate a new element N:

before inserting N: [ABCDE]⇄[FGHIJ]⇄[KLM__]
after inserting N:  [ABCDE]⇄[FGNHI]⇄[JKLM_]
changed memory:                ^^^   ^^^^

Instead, with a true linked list (= of individual elements), inserting a new element N in the middle simply consists of allocating a new node and updating the value of four pointers, which is an operation whose performance is independent of the current size of the linked list:

before inserting N: [A]⇄[B]⇄[C]⇄[D]⇄[E]⇄[F]⇄[G]⇄    [H]⇄[I]⇄[J]⇄[K]⇄[L]⇄[M]
after inserting N:  [A]⇄[B]⇄[C]⇄[D]⇄[E]⇄[F]⇄[G]⇄[N]⇄[H]⇄[I]⇄[J]⇄[K]⇄[L]⇄[M]
changed memory:                                ^ ^ ^                           

The trade-off is that the deque has better memory locality and requires less independent memory allocations. For example, inserting the new element N in the deque above didn't require any new memory allocation at all. This is why in practice, and especially if you're often inserting at the ends rather than in the middle, a deque is in fact a better choice than a linked list.

Note that while inserting elements in the middle of a deque is O(n), inserting new elements at the beginning or the end is O(1):

before:                [ABCDE]⇄[FGNHI]⇄[JKLM_]

prepending P:  [____P]⇄[ABCDE]⇄[FGNHI]⇄[JKLM_]
                    ^ ^

prepending Q:  [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLM_]
                   ^

appending R:   [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLMR]
                                            ^

appending S:   [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLMR]⇄[S____]
                                              ^ ^

Caveats

Of course, for the linked list insertion to actually be O(1), this assumes that you already have a handle h to the node before or after which you want to insert your new node n. On a hypothetical implementation of LinkedList, this could look like:

n = linkedlist.insertbefore(h, "some value")

where:

type(h)     # => <class 'Node'>
type(n)     # => <class 'Node'>
n.value     # => "some value"
n.next == h # => True

If you do not have such handle, then a function like insert(i, x) will still be O(n), because finding the i-th element is O(n), even though the insert operation itself is O(1). Here is some hypothetical implementation of insert(i, x) on our hypothetical LinkedList:

def insert(self, i, x):
    node = self.node_from_index(i)       # Find the i-th node: O(n)
    return self.insertbefore(node, x)    # Insert and return new node: O(1)

This means that linked lists are only worth it for certain problems when you do keep those node handles around. It also makes the API a bit less convenient, and despite each operation being O(1) if you're careful, the constant is generally much bigger. So in practice, they are not that often useful, which may be why they're aren't built-in linked lists in Python.

Boris Dalstein
  • 7,015
  • 4
  • 30
  • 59
-2

It's just because building list takes majority of the time rather than append method. So, when it's not a linear time operation as you showed, (ex : n^2 operation) append method will be more significant than building which will lead to the result you want to see.

임정섭
  • 347
  • 2
  • 9