4

Many programming languages offer built-in datastructures, though some datastructures, such as the linked list is known to be very easy to implement, and therefore, languages often don't include that as a built-in datastructure. Some languages, such as C++ has (as std::list, doubly-linked), as well as Java (as LinkedList<T>, doubly-linked).

For example, Python already has several datastructures built-in in its library, including lists, tuples, sets, dictionaries (which do not even require an import), as well as datastructures from the collections library. It does not have a built-in LinkedList data structure because in Python it is already easy to implement your own LinkedList.

Please note that the underlying datastructure a Python list is really an array as answered in "What is the underlying data structure for Python lists?". Also, the deque in collections is a double-ended queue, which should have missing features as it doesn't normally support indexing or inserting in the middle (however, indexing features have been added since 3.5 with index and insert, but aren't present in Python 2).

It seems that offering a built-in LinkedList datastructure is redundant and unnecessary. Why do some other programming language libraries (such as the C++ and Java library) include LinkedList datastructure, even if they can be easy to implement? If so, what are the risks of implementing your own linked list in these languages?

  • 5
    I believe Guido had a post about this, possibly on his Python History blog, but I can't find it. IIRC, the point was that there are many different options for a linked list (single or double linked? node as list or handle as list? mutable or immutable? tail-sharing?), they're all very easy to build in Python, there's no real speed or space advantage to implementing them in C (unlike arrays, or hash tables), and none of them is useful so much more often than the others that it needs to be builtin. – abarnert May 12 '18 at 21:20
  • 1
    Linked lists are usually a horrible idea. Their main claim to fame is that they are trivial to implement. In particular, most real-world linked lists are ad-hoc intrusive linked lists, not container-like linked lists. – o11c May 12 '18 at 21:22
  • @o11c Another thing to note is *why* Java and C++ provided a built-in LinkedList? –  May 12 '18 at 21:23
  • 1
    @abarnert are you able to locate this? it would be very useful for providing a proper answer. – Cosmo May 13 '18 at 05:56
  • @Cosmo No, I couldn't find it. But I can write a speculative answer… – abarnert May 13 '18 at 05:57
  • 1
    @abarnert I've been considering writing a speculative answer combining your comment with quotes from the documentation of the linked list API from various languages, which are approximately `Almost always it is better to use *our contigous resizable array* or *our deque* instead of *our linked list*. In general, array-based containers are faster, more memory efficient and make better use of CPU cache.` I'm kind of new though, so I wasn't sure how frowned upon such speculation is in an actual answer. – Cosmo May 13 '18 at 06:00
  • @Cosmo It will require more research, but I would consider recording the time taken for operations on array-based and non-array based containers. –  May 13 '18 at 06:01
  • We should probably also talk about/reference the idea of amortised performance. Possibly by linking a more generic SO question/SEE question about the performance benifits of array-based containers. – Cosmo May 13 '18 at 06:06
  • 1
    @Cosmo The standard answer is that you almost always want either a wide tree or a long rope, to get the best of both worlds… except that actually, you don't need to scale to 3 billion elements and still splice in the middle, so you just want an array. Raymond Hettinger's comments on [the `blist` pep](https://www.python.org/dev/peps/pep-3128/) might be relevant here. – abarnert May 13 '18 at 06:40

1 Answers1

11

I searched through Guido's Python History blog, because I was sure he'd written about this, but apparently that's not where he did so. So, this is based on a combination of reasoning (aka educated guessing) and memory (possibly faulty).

Let's start from the end: Without knowing why Guido didn't add linked lists in Python 0.x, do we at least know why the core devs haven't added them since then, even though they've added a bunch of other types from OrderedDict to set?

Yes, we do. The short version is: Nobody has asked for it, in over two decades. Almost of what's been added to builtins or the standard library over the years has been (a variation on) something that's proven to be useful and popular on PyPI or the ActiveState recipes. That's where OrderedDict and defaultdict came from, for example, and enum and dataclass (based on attrs). There are popular libraries for a few other container types—various permutations of sorted dict/set, OrderedSet, trees and tries, etc., and both SortedContainers and blist have been proposed, but rejected, for inclusion in the stdlib.

But there are no popular linked list libraries, and that's why they're never going to be added.

So, that brings the question back a step: Why are there no popular linked list libraries?

First, "linked list" is not really one type, but a whole family of types—single vs. double-linked, node-as-list vs. handle-as-list, tail sharing or not, lazy (presumably via @property-style triggers) or strict… as well as variations like circular lists. And each one of these is useful far less than the family as a whole, of course. And you can't share all that much between interfaces or implementations for them—an interface for dealing with Lisp-style cons lists makes no sense at all for something like a C++ std::list.

Second, all of the different variations are dead easy to build. The reason OrderedDict made it into the language is that people were able to show that it's actually very tricky to get right. Sets are easy to get right, but impossible to space-optimize without borrowing behind-the-scenes implementation details out of dicts. Linked lists are easy to get right and to optimize. If you want one, any of the varieties, you build one in a few minutes, and you're done. (In fact, OrderedDict uses one under the hood…)

Meanwhile, since Python 2.3, and especially since 3.0, the iterator protocol has become the central paradigm for everything. And there's a reason other languages are borrowing Python's generators—but in most of those languages, they don't provide the same benefits, because the whole language isn't built around them. Being able to forward-iterate any collection—including a "virtual" collection that doesn't even exist—already gives you most of the advantages of a lazy tail-sharing cons list without needing one. On the other hand, the main thing it's missing is convenient constant-time (mutating) splicing—but that one missing thing is actually pretty hard to fit into the iterator paradigm. (See itertools for how you can do it—it's just tee and chain, after all—and for how clumsy it feels to do so.)

(Of course there are also advantages to a more complicated iteration protocol, like the one used by C++, or, even better, Swift. With bidirectional and mutable vs. immutable iteration, doubly-linked lists might be a lot more useful. But there's a tradeoff between power and simplicity there, and Python made its choice long ago.)

And, last, but definitely not least, linked lists, especially cons-style tail-sharing linked lists, are an inherently recursive data structure. The Python philosophy is to use recursion only when you have inherently recursive problems. And Guido believes there are very few of these, which is why Python doesn't do, e.g., tail recursion elimination. He didn't want map and reduce in his language, but once he figured out how to do them iteratively instead of recursively, he was willing to let them stay. And storing linear data is not an inherently recursive problem. Trees? Sure, they pretty much have to be recursive. But lists? No.

abarnert
  • 354,177
  • 51
  • 601
  • 671