Which is a more efficient implementation of a linked list?

Question

There are at least two ways to represent a linked list:

1.)Using an array based representations of the linked list, where we keep an std::vector of structs of the type

struct {
    <whatever-type-you-want> item ;
     int   nextitem; 
   }

Here inserting into the list, is doing a push_back() on the vector and giving an appropriate value to next-item.

2) In which you have a collection of structs all over RAM. Here insert is done with the C++ operators new .

Is it correct to say, that the first method is more efficient since all the items are in consecutive locations in memory, because of which one might be able to grow the linked list to a much larger size than the second method

In the second method, there might be memory fragmentation with huge linked lists because of which one might get a segmentation fault much earlier.

I mean, one can emulate linked lists in this fashion. The record `nextitem` tells which element in the vector is the next item — smilingbuddha, Jul 15 '12 at 23:18
Not sure what you mean by "since one can grow linked lists to much larger sizes...". — Vaughn Cato, Jul 15 '12 at 23:20
2. Something to consider is that you will run into the memory limit much sooner using a vector as most implementations ask for 1.5-2.0 times the original size when they need reallocation. — Tim Finer, Jul 15 '12 at 23:20
I'm not convinced that you understand conceptually what a linked list is. — Ed S., Jul 15 '12 at 23:21
@TimFiner: however there is much less overhead than with heap allocation. — Vaughn Cato, Jul 15 '12 at 23:21
Your link list is going to behave in many ways like an array, which doesn't make a lot of sense. You would be better off just using an array (or vector). A linked list is really a way to describe behavior, not implementation (though implementations don't typically vary much) Call it what you like, but your idea is simply not a linked list. — Ed S., Jul 15 '12 at 23:26
I think this would be a good question if it was more in the form "What are the performance tradeoffs of implementing a linked list this way." What is actually more efficient is going to depend on a lot of details. — Vaughn Cato, Jul 15 '12 at 23:30
@EdS. I think the proposal is a singly-linked list. Either I'm missing something, or you misunderstood the proposal. Instead of a pointer to an address, the link to the next node is the vector index of the next node. — Darshan Rivka Whittle, Jul 15 '12 at 23:49
Maybe you could give some details about how you plan to implement insert/delete. — Vaughn Cato, Jul 15 '12 at 23:49
@KerrekSB It's not a vector of elements, it's a vector of nodes, each of which is a value and a link to the next node. — Darshan Rivka Whittle, Jul 15 '12 at 23:50
@DarshanComputing: Yeah, but it is really just a vector of elements plus some extra stuff. Which isn't actually needed for the usage that the OP describes. — Kerrek SB, Jul 15 '12 at 23:58
@VaughnCato True. I think if the number of elements is unknown, though, you'll run out of space much faster with a vector. Vector has a good chance of fragmenting memory before reaching the limit of unused memory or asking for more than is available sooner. Without testing though, this is all speculation. — Tim Finer, Jul 16 '12 at 00:06
@KerrekSB I'm not seeing a clear use case in the question. But why ignore the "extra stuff"? It's the extra stuff (a link to the next node) that makes it precisely satisfy the definition of a linked list... — Darshan Rivka Whittle, Jul 16 '12 at 00:10
A while ago i would think this is plain stupid.Just right now i have a custom data structure like this(pretty damn more complex of course).The advantage is that i can throw the entire thing away very fast which is what made me create it. — Behrooz, Jun 01 '13 at 15:27

Ivan Vergiliev · Answer 1 · 2012-07-16T00:26:47.647

I'll go against everyone else here and say that, yes, the first approach might end up being more efficient. In the second approach, you're allocating memory on the heap O(N) times - N being the number of nodes in the list. If you're using a vector, you're only making O(log N) number of heap allocations.

Also, if you're on a 64 bit machine, the overhead of saving a pointer in each node may be a bit too much if you're dealing with lots of small items. Using a vector, you can use a smaller nextItem - e.g. 32 bit instead of 64, which, if you're making a list to hold 32 bit ints, would be a 1.5 improvement in memory usage.

Another possible optimization is that if you know up-front that you'll be dealing with a lot of elements, you can reserve a big vector and have a single heap allocation for a pretty long time.

I recently took a course on applications of automata and the lecturer is implementing some of the algorithms for pretty large data sets. One of the techniques he told us was exactly your first approach of representing a linked list. I had a course work that I tried implementing both ways (with pointers and with a vector and nextItem kind of thing) and the vector one was acting much better (it did have other optimizations too, but the vector definitely had an effect).

NOTE TO OTHERS

I think what @smilingbuddha is asking about is more like a collection of linked lists - or at least that's what I've used it for. For example, when you save a graph using a list of neighbors. You need a linked list (or array, or whatever) of all the neighbors for each node. So instead of keeping an array of linked lists or a vector of vectors, you just keep of array of indexes pointing to the last inserted neighbor for every node.

John Humphreys · Answer 2 · 2012-07-15T23:38:07.710

3

Implementing a list with a vector is misguided.

I'll explain. Containers are typically designed to achieve a certain set of goals, and the underlying implementation is selected based on those goals.

A vector is very good because it has contiguous memory and you can reach any cell by pointer arithmetic. Unfortunately, a vector has terrible performance when inserting or deleting an element in the center of the vector.

A list has the exact opposite intention. Navigating to a point in a list is time consuming because you have to follow links because its not contiguous. BUT the primary purpose of a list is to allow fast insertions, deletions, reordering, splicing, reversals, etc.

So, thinking of a vector as an implementation base for a list (while can be done) really isn't the way to look at this. Implementing a list with a vector would basically mean you didn't have any of the advantages that made you choose a list in the first place.

EDIT

As other people have pointed out in the comments below, if you're thinking of more complex implementations, you could definitely get performance benefits out of this.

For example, if you maintain a vector with references to all of the pointers, and you work to keep that reference vector in order, you can have the benefits of pointer-arithmetic access while still having relatively fast removal/insertion, etc. Also, since the reference vector just holds pointers to dynamically allocated objects, manipulating the reference vector isn't costly and you still don't have to have a massive region of contiguous memory being used (the vector would just be NumElements * sizeof(pointer) on your architecture).

You should look at a std::deque implementation for some fun. They have some interesting interplay between contiguous memory regions linked by pointers to speed up insert/delete/other operations.

edited Jul 15 '12 at 23:38

answered Jul 15 '12 at 23:26

John Humphreys

37,047
37
155
255

I disagree. An array-backed linked-list offers no particular disadvantages (if implemented correctly), whilst offering the potential advantage of spatial locality. – Oliver Charlesworth Jul 15 '12 at 23:31
Actually, I've thought about it some more, and I take that back! – Oliver Charlesworth Jul 15 '12 at 23:32
Touche though, I had to think about that pretty hard after you mentioned that last comment, haha. I didn't think about more complex implementations like that when I answered the first time. So, you had a fair point. – John Humphreys Jul 15 '12 at 23:33
I've implemented linked lists in this way for certain special purposes, and I've seen performance benefits. It would probably be similar to the benefit that you would get with a custom allocator. – Vaughn Cato Jul 15 '12 at 23:34
1

@w00te: Perhaps! Insertion and deletion aren't inefficient *per se*; they're amortised O(1) on a vector (which is why I initially downvoted). The problem is keeping track of which elements are free to be reused (when I realised this, I removed the downvote!). – Oliver Charlesworth Jul 15 '12 at 23:35
@OliCharlesworth Alright, I threw up some extra information since I didn't appreciate the complexity the first time I answered :) Feel free to add anything you think I missed, sounds like you might know some things I haven't thought of. – John Humphreys Jul 15 '12 at 23:42
@oli - one could implement with a head to data cells, and a head to the deleted - reusable data cells – EvilTeach Jul 16 '12 at 00:33
In .Net, the defacto List implementation is based on a vector. To borrow a phrase from Eric Lippert, "O(n) is O(1) if n does not grow large." http://stackoverflow.com/a/6750591/1272096 – Monroe Thomas Jul 16 '12 at 01:51

score 2 · Answer 3 · answered Jul 15 '12 at 23:23

On the contrary; using your first method, it is inefficient to remove items from the linked list as you "lose" the slot in the vector where that item was stored and would have to walk the whole list in a garbage-collection style to discover which slots are not being used.

With regard to memory fragmentation, having lots of small allocations is not an issue generally; indeed as a vector is required to be contiguous allocating the memory for it will cause fragmentation as you require larger and larger blocks of contiguous memory. In addition, each time the vector is resized you are causing the copying of large blocks of memory.

In fact, your first answer is arrogating to yourself the job of the memory allocator and memory management unit. The job of the memory allocator is to hand out small chunks of memory; the job of the MMU (among others) is to ensure that pointers between blocks of memory continue to point to the same logical memory even when moved around in physical memory. Your nextitem int members are essentially functioning as pointers. Unless you have very specialised requirements, the hardware, kernel and malloc can do this job far better than you can.

score 1 · Answer 4 · answered Jul 15 '12 at 23:38

Your logic is completely backwards. The first approach requires that memory be contiguous and will fail as soon as insufficient contiguous memory is available. Your second approach can use memory whether contiguous or not and will continue to work until no memory at all remains.

score 0 · Answer 5 · answered Jul 15 '12 at 23:24

Your first approach seems to blend two algorithms and, therefore, I would say is less efficient.

One of the advantages of a linked list is that items can easily be inserted and deleted. Yet using your approach, they require shifting data around. You may as well use a simply resizable array.

In addition, an array requires memory to be contiguous. In some circumstances, you will run out of memory sooner than with a true linked list when working with large amounts of data because there may be times when a certain amount of memory is available, but not contiguously.

score 0 · Answer 6 · answered Jul 15 '12 at 23:27

If you remove an element from the list in the case #1, a good part of the remaining elements may get their nextitem indexes messed up. So #2 is the usual way to go and won't cause any memory problems if properly implemented, unless you try to insert an insane number of elements into the list, or any other container for that matter.

Which is a more efficient implementation of a linked list?

6 Answers6