3

I'm looking for a data structure to hold an unordered collection of unique elements, which would support the following operations

  1. insertions/deletions of elements anywhere in the collection
  2. querying if an element is present
  3. access to a random element

Naively, 1 and 2 suggest using an associative container, e.g. unordered_set, but then 3 is linear in number of elements. Using a random access container, e.g. vector, makes 3 easy, 1 can be done in O(1), but then 2 is again O(N).

The question is if there's a known way around this linear complexity?

EDIT: By a random element in 3 I mean: given an arbitrary ordering of N elements, retrieve an element number j where j is between 0 and N-1. For an std::vector it's just subscripting, for an std::list or an std::set it's incrementing the list/set iterator j times starting from begin() etc.

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • Access to a random element by which key? If it's unordered, then it can't be index -- which means std::set or std::hash_map or similar. – Roger Lipscombe May 28 '12 at 21:39
  • @lezebulon: for an std::set 3 would still be O(N), would it not? – ev-br May 28 '12 at 21:45
  • 1
    No, it would be O(logn). `std::set` is implemented using red black trees. – mfontanini May 28 '12 at 21:46
  • @RogerLipscombe: by an index, yes. Given an arbitrary ordering of N elements, I want to get an element number j where j is an integer from 0 to N-1. – ev-br May 28 '12 at 21:48
  • 5
    If you want an "unordered collection", insertion and deletion "anywhere in the collection" doesn't make much sense. – juanchopanza May 28 '12 at 21:53
  • See [my answer to this question](http://stackoverflow.com/questions/3052788/how-to-select-a-random-element-in-stdset/24533580#24533580) for an approach based on a sorted vector which yields O(N) for 1, O(log N) for 2, and O(1) for 3. – davidhigh Jul 02 '14 at 14:56

4 Answers4

3

The two standard containers which are the best for your task, are - like you said, vector with 1. and 2. in O(n) and 3. in O(1) and set with 1. and 2. in O(log n) and 3. in O(n). Depending on the size of your data structure, the algorithmic complexity is not that important. A vector has the additional advantage of data locality so the CPU Cache can be exploited better.

If the actual order of the elements doesn't matter, insertion in the vector can be done in amortized O(1) (push_back) and deletion can be done in amortized O(1) if you swap the element you want to delete with the last element and delete that.

If you really have a big data structure, you could use Boost.Multi-Index to build a data structure where 1. is O(n), 2. is O(log n) and 3. is O(1). But, like I said, if your data structure is not really big a vector should just work.

If the order in the random access index does not matter, insertion can be done in amortized O(log n) (push_back). For deletion you can't use the swap trick because this would invalidate the other indices.

Florian Sowade
  • 1,727
  • 12
  • 20
1

I have been looking for such a data structure for a long time.

Recently, I found quite promising library which has all the functionality that you are looking for.

See the cntree::set with random access in O(log n).

here is the link. http://dl.dropbox.com/u/8437476/works/countertree/index.html

Although it seems to be under development, I see it is quite usable.

Sungmin
  • 2,499
  • 3
  • 26
  • 32
  • That looks interesting. What's the status of this library? – ev-br Jun 16 '12 at 12:57
  • Actually, I am not the right person to tell the exact status of the library. At least I found is that basic functionality of each container works as I expected. I used this library in several my projects, and it was okay. But I am not sure it is stable enough. – Sungmin Jun 18 '12 at 16:32
  • In the source code, there is the email of the library author. I think it would be better to ask him directly. – Sungmin Jun 18 '12 at 16:34
1

Depending on exactly what your needs are regarding #3 std::unordered_set may be quite appropriate.

I was looking for container with the above properties so that I can iterate over all elements similar to for(int i = 0; i < myset.size(); ++i) process(myset[i]);. I found this page which describes std::unordered_set::bucket_count(), std::unordered_set::begin(size_t bucket_number) and std::unordered_set::end(size_t bucket_number).

This becomes extremely handy if you have OpenMP loops, so you can write:

std::unordered_set<Element> myset;

#pragma omp parallel for
for(int i = 0; i < myset.bucket_count(); ++i) {
   for(auto it = myset.begin(i); it != myset.end(i); ++it)
      processElement(*it);
}

This still doesn't allow you direct access to myset[i], but it comes pretty close as you can access elements in numbered buckets.

David R.
  • 855
  • 8
  • 17
0

std::unordered_set. Accessing elements is not O(N) if you use the index j as the key, it's O(1).

What else were you planning on using as the key for an associative container, if you have a unique index that you want to use for lookup and you don't care about ordering otherwise?

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • does `std::unordered_set` support indexing? – ev-br May 28 '12 at 21:54
  • 1
    It supports access by a key, why not use your "index" as the key? – Jonathan Wakely May 28 '12 at 21:55
  • 1
    If you need to use something else (that you haven't mentioned) as the key and want to use something else as the "index" you could either keep a separate `std::vector::iterator>` so you can index into the vector to get an iterator to the element, or use a `boost::multi_index` (which can be used in the same way, but automatically managing the index instead of you needing to keep it separately.) – Jonathan Wakely May 28 '12 at 21:57
  • It's just keys, nothing else. What I mean by 3. is: given N keys in arbitrary order, I want to be able to retrieve an element number j where j is from 0 to N-1. And the keys themselves are not necessarily integers from 0 to N-1. – ev-br May 28 '12 at 22:00
  • @Zhenya but you say you want an unordered collection, so how does this `j` number map to an element in the collection? – juanchopanza May 28 '12 at 22:02
  • @juanchopanza: suppose the keys are in arbitrary order. Now start from the beginning of the sequence and increment an iterator `j` times. – ev-br May 28 '12 at 22:06
  • @Zhenya OK, so you don't care what element is actually in that position? You cannot assume anything about the positions of elements in an unordered collection, so you would essentially be removing a random element. – juanchopanza May 28 '12 at 22:09
  • @juanchopanza: yup, that's exactly what I'm after! In case you wonder why on earth do require that --- it's for a stochastic Monte Carlo type algorithm. – ev-br May 28 '12 at 22:11