4

By vector vs. list in STL:

  • std::vector: Insertions at the end are constant, amortized time, but insertions elsewhere are a costly O(n).

  • std::list: You cannot randomly access elements, so getting at a particular element in the list can be expensive.

I need a container such that you can both access the element at any index in O(1) time, but also insert/remove an element at any index in O(1) time. It must also be able to manage thousands of entries. Is there such a container?

Edit: If not O(1), some X << O(n)?

  • a hash table maybe? – Nikos M. May 04 '20 at 22:41
  • 5
    Because of memory behavior, `std::vector<>` is almost always the best choice regardless of algorithmic complexity. –  May 04 '20 at 22:41
  • @Frank But it is so slow with many elements! –  May 04 '20 at 22:43
  • 1
    Some good reading (and a link to some good viewing) on the topic: http://www.stroustrup.com/bs_faq.html#list – user4581301 May 04 '20 at 22:44
  • 3
    @super Have you tested the claim that "it is so slow" or are you just basing on the time complexity? It depends a lot on the platform, but usually it requires at least tens or hundreds of thousands of elements before other containers become faster. – Yksisarvinen May 04 '20 at 22:58
  • @super Did you compile with optimizations enabled, when testing, at the very least? – Algirdas Preidžius May 04 '20 at 23:08
  • @super Are you `reserve()`'ing the vector's capacity before adding elements to it? If not, then adding lots of elements takes time due to reallocating the array and moving data around. – Remy Lebeau May 04 '20 at 23:19
  • @AlgirdasPreidžius Yes, max –  May 04 '20 at 23:25
  • @RemyLebeau The vectors size remains basically constant throughout the entire algorithm –  May 04 '20 at 23:25
  • 1
    What, specifically, are you trying to do? I'm guessing that there's some other data structure specialized for your purposes that would be more useful here. – templatetypedef May 04 '20 at 23:28
  • 1
    @super that did not answer my question. You still have to populate the vector, so is your concern about the complexity *during* that population, or *after* that population? You haven't really provided any useful details about how you want to use the container you seek. – Remy Lebeau May 04 '20 at 23:35
  • @Remy Lebeau Like I said, after. First it is filled with tens of thousands of elements. This takes less than 100 ms. Then elements are inserted and deleted, and at most the container size increases by 5 during this operation, which takes 15000 ms. –  May 05 '20 at 08:05
  • @super adding even 1 element to a vector may cause a complete reallocation of its array, and removing even 1 element may cause the entire array to shift all of the elements around. 15 seconds to manipulate a vector implies either you are using an inefficient algorithm, or the shifts of elements is inefficient. Are you storing objects, or pointers to objects? If objects, do they implement move semantics? – Remy Lebeau May 05 '20 at 16:03
  • @Remy Lebeau it is a vector of list iterators, as such irreducible. Here is what I agree on. Yes, the algorithm is not the best, and I bet it could be improved, I managed to cut the time by doing sparse iterations, which turned out to be no problem for its purpose, but the algorithm itself remains the same and I at some point should improve it. However I find it highly unrealistic that reallocations like that would occur. Noone in their right mind would write an allocator like that. Unless you have solid proof of course. –  May 07 '20 at 16:01

2 Answers2

10

There's a theoretical result that says that any data structure representing an ordered list cannot have all of insert, lookup by index, remove, and update take time better than O(log n / log log n), so no such data structure exists.

There are data structures that get pretty close to this, though. For example, an order statistics tree lets you do insertions, deletions, lookups, and updates anywhere in the list in time O(log n) apiece. These are reasonably good in practice, and you may be able to find an implementation online.

Depending on your specific application, there may be alternative data structures that are more tailored toward your needs. For example, if you only care about finding the smallest/biggest element at each point in time, then a data structure like a Fibonacci heap might fit the bill. (Fibonacci heaps are usually slower in practice than a regular binary heap, but the related pairing heap tends to run extremely quickly.) If you're frequently updating ranges of elements by adding or subtracting from them, then a Fenwick tree might be a better call.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Kudos for the theoretical result +1 – Nikos M. May 04 '20 at 22:45
  • neither list nor vector have sublinear lookup, which means we can sacrifice that one. – Mooing Duck May 04 '20 at 23:51
  • @MooingDuck I suppose it depends on what "lookup" means. I assumed this was "lookup by index," in which case the vector works on that front but isn't fast enough when it comes to insertions and deletions. – templatetypedef May 04 '20 at 23:55
  • Oh. Then one can absolutely make a container with logN insert, indexLookup, remove, and indexUpdate. I've made one. – Mooing Duck May 05 '20 at 00:22
  • Yep! That's the order statistics tree that I was referring to. (This is usually done as a balanced BST, but if you just use the shape of the order statistics tree and not the sorting functionality you get a sequence where each operation has logarithmic cost.) – templatetypedef May 05 '20 at 00:28
  • What this answer is missing is that structures that approximate the theoretical limit have a prohibitive constant overhead which make them useless in practice for almost all applications (this is famously true for Fibonacci heaps). It’s less relevant for order statistics trees but since these are just augmented binary search trees, all the properties and disadvantages of the latter apply. – Konrad Rudolph Jul 08 '20 at 10:02
0

Look at a couple of data structures.

  • The Rope
    Tree of arrays. The tree is sorted by array index for fast index search.
  • B+Tree
    Sorted tree of sorted arrays. This thing is used by almost every database ever.

Neither one is O(1) because that's impossible. But they are pretty good.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131