53

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:

  1. array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.

  2. stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .

  3. stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.

Right now, my choice is to use an array. Is it justifiable? Or did I miss something?

Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?

What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Mohan
  • 1,850
  • 1
  • 19
  • 42
jasonline
  • 8,646
  • 19
  • 59
  • 80
  • 4
    Why do you need to do any array shifting if it's a fixed size? – Myles Dec 15 '09 at 05:56
  • 2
    What are you actually implementing? – GManNickG Dec 15 '09 at 06:00
  • 1
    @Myles: It's like a ranking information. Think of it like a list of 10 prices sorted in increasing order. If a new price comes and I have to insert it into the middle rank, then the current last price has to go. So I need to shift some positions to make way for the new rank. – jasonline Dec 15 '09 at 06:30
  • 1
    *"accessing an item is slower than an array since it is a call to operator[]"* - Ok, stop right there. – Christian Rau Jul 16 '13 at 15:59
  • Another option is a std::array, if the length truly is fixed. – Adrian McCarthy Jul 16 '13 at 16:00

8 Answers8

64

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.

You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.

I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Frank Krueger
  • 69,552
  • 46
  • 163
  • 208
  • 3
    I read somewhere in some post it was looping through more than 1000 items comparing array and vector and in the vector's case, it was mentioned it took longer... and the answers said that it is because it is a template, using operator[] produces overhead...? – jasonline Dec 15 '09 at 05:57
  • 7
    @jasonline I use vectors all over a game I wrote for the iPhone. I have never once run into a performance issue with the use of operator[]. These are what we call "micro optimizations" - they're micro because they have little effect on the overall performance of the app. While I spent years learning the lessons of micro optimization, I highly recommend you avoid them and focus on creating a great app. – Frank Krueger Dec 15 '09 at 06:01
  • 2
    Actually vector is always faster than list for inserts and deletes **if you have to search for the element**, especially on large datasets. While vector operations are always O(N) and list operations are on average O(N/2), the list operations are dominated by unordered memory access time which makes them individually much more expensive than the extra N/2 operations the vector version uses. – Don Neufeld Dec 15 '09 at 06:02
  • 1
    Ok, you mentioned vector is faster under a certain size. In what case will it be that list will become faster than the vector? – jasonline Dec 15 '09 at 06:04
  • I also said I had no numbers. :-) At the time, the number was around 10,000. But that was years ago and processors and memory buses have changed since then. Fortunately, these are trivial performance tests to run. Write a test app and find out for yourself. Also take into account the @don.neufeld. – Frank Krueger Dec 15 '09 at 06:08
  • Profile it and find out, really. There's no way to know. We can only make educated guesses, and often those are wrong anyway. – GManNickG Dec 15 '09 at 06:08
  • @Frank Krueger: Thanks. I was just curious because I never considered that if a vector is faster than list, there will come a time when list will become faster than the vector due to its size. – jasonline Dec 15 '09 at 06:12
  • 1
    Bjarne Stroustrup is pretty critical of list and favors vector here : http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style – Robert Mason Mar 04 '12 at 16:52
31

Premature optimization is the root of all evil.

Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

Don Neufeld
  • 22,720
  • 11
  • 51
  • 50
  • 1
    Yes, I know the performance doesn't matter so much in this case since it is only a few data but I was just wondering... – jasonline Dec 15 '09 at 05:59
  • 4
    When your writing small application your faced with Log(1) problems. When you've gotten the basics down, you can start to know what you need to scale too. When you know what you need to scale too, not some theoretical limit eg..."We need to scale in the future". Then you can choose algorithms and data structures to suit. Until then, just use std::vector. – Chad Dec 15 '09 at 06:24
22

It is really worth investing some time in understanding the fundamental differences between lists and vectors. The most significant difference between the two is the way they store elements and keep track of them.

- Lists -

List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.

Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed

- Vectors -

Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.

There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.

As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

djogon
  • 281
  • 2
  • 3
6

Prefer an std::vector over and array. Some advantages of vector are:

  • They allocate memory from the free space when increasing in size.
  • They are NOT a pointer in disguise.
  • They can increase/decrease in size run-time.
  • They can do range checking using at().
  • A vector knows its size, so you don't have to count elements.

The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

Vijay Mathew
  • 26,737
  • 4
  • 62
  • 93
  • Regarding the size, I don't think it matters in this case because I'm maintaining a fixed size of 10. I agree though that at() provides security in accessing elements. – jasonline Dec 15 '09 at 06:09
  • 1
    @jasonline Well, if all you want is to maintain a list of 10 elements and don't need the conveniences offered by the standard library (like templates), just use a statically allocated array. C++ is a flexible language, anyway! – Vijay Mathew Dec 15 '09 at 06:15
  • 2
    The question explicitly said that the table is fixed size, so I'd recommend std::array, which gives the benefits of std::vector without the overhead for resizing. Why pay for something you aren't going to use? – Adrian McCarthy Jul 16 '13 at 16:02
3

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.

If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.

Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.

So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.

The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
1

First a couple of notes:

A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.

STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.

Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

Graphics Noob
  • 9,790
  • 11
  • 46
  • 44
  • Yes, you're right. Lists don't have operator[], my mistake. What made my choice of array wrong? Because of the vector operator[] and the thought that I still need to implement the operations in an array compared to the vector. Right? – jasonline Dec 15 '09 at 06:25
  • Vijay Mathew did a good job outlining the benefits of the vector over the array, but also keep in mind that since the vector keeps it's contents contiguous in memory the array has no benefits over the vector. You aren't entirely wrong to assume that operator[] has worse performance than indexing an array, but under the hood the complier will probably reduce them to exactly the same thing. – Graphics Noob Dec 15 '09 at 09:51
0

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:

Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.

The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.

If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.

template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
  vect[index]= vect.back();//move the item in the back to the index
  vect.pop_back(); //delete the item in the back
}

template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
  vect.push_back(vect[index]);//insert the item at index to the back of the vector
  vect[index] = value; //replace the item at index with value
}
Zachary Kraus
  • 1,051
  • 10
  • 21
0

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)