25

I've got a set of data that I need to store in an ordered map (i.e. with efficient insertion, deletion, and locating items by key), but I also need to be able to find the nth element without walking through the entire map (there may sometimes be tens of thousands of items in it).

I know of one way to do it: use a red/black tree, but keep the total number of child items on one of the legs of each node as well. It makes insertion and deletion a little slower (because you have to update the counts on each node along the path as you do), but you can find the nth element for any n in roughly the same time as finding a key.

I'm wondering if there's an existing C++ implementation of such a thing that I can use. I can write it myself if not, but I'd really rather not.


EDIT: I got some clarification on the use-case for it. I misunderstood it slightly: after looking up an item by key, they need the ability to efficiently find out what index the found item is, to properly display the scroll bars.

It is a legitimate need, and the data structure I described above will still work for it, so I'm still looking for an answer. But as it seems that no one has come up with one yet, I'm going to start coding it myself.

Head Geek
  • 38,128
  • 22
  • 77
  • 87
  • 2
    The standard library implementations don't support this, but you're right that an augmented R/B tree would work. I don't know any implementations of this, though. :-( – templatetypedef Jan 14 '11 at 00:45
  • 4
    Sounds like you picked the wrong container. Maps are indexed by key, *not* by some arbitrary "position from the start". – Lightness Races in Orbit Jan 14 '11 at 01:00
  • 7
    @Tomalak: Sounds like they're in the process of picking the right container, and can't find their requirements in the stdlib or elsewhere. – Fred Nurk Jan 14 '11 at 01:07
  • 3
    @FredNurk: I agree. However, from what I've read, I believe that their requirements indicate a "design smell". There is, after all, a good reason that they haven't found anything yet that matches their "requirements". – Lightness Races in Orbit Jan 14 '11 at 01:11
  • 2
    And why exactly do you need to find the nth element quickly? – Karl Knechtel Jan 14 '11 at 01:21
  • 1
    @Karl: because that's what the requirements specified. :-) I believe they want to use it to display (a subset of) the items in it, starting from an arbitrary position. Though now that I think about it, I'm not sure where they would get that arbitrary position from. – Head Geek Jan 14 '11 at 02:04
  • 1
    This does sound like a poor design. If they just want to be able to start from a specific point for display purposes, then why can't they just keep an iterator into a map? That way lookup time is O(1), can't get any faster than that. – Grant Peters Jan 14 '11 at 02:09
  • It's the "starting from an arbitrary position" that's driving that requirement. But as I said, I don't know where that value comes from. – Head Geek Jan 14 '11 at 02:11
  • What's the expected relative frequency of insertion, deletion, and lookups? – dan04 Jan 14 '11 at 03:14
  • 1
    If its just "starting at" add: iterator GetIndex(i) { iterator iter = begin(); while(--i) { ++iter; } return iter; } -- As long as they use the iterator after that, then its an O(1) lookup for each element retrieved in sequence after an initial O(i) lookup – Grant Peters Jan 14 '11 at 16:54
  • 4
    Here is a similar question. http://stackoverflow.com/questions/2290429/rank-tree-in-c So we know how to do it (it's in the "Augmenting Data Structures" chapter of Cormen, Leiserson, Rivest) but it seems like the C++ stl tree doesn't allow you to add fields to the tree node. Maybe boost's "intrusive" library? I've never used that. I'd also end up writing the tree myself, I think. – Rob N Jan 14 '11 at 23:02
  • @Rob: And I thought I'd come up with something unique, when I first came up with it (around '93). :-( Oh well. If I'd seen that other question earlier, it might have saved me a lot of work. As it is, I'm almost done implementing it, so I'll continue writing my own. – Head Geek Jan 15 '11 at 01:14
  • @Rob: I don't know whether there's a way to do it with a Boost library, but I'd prefer to write it myself. Even if there is, it might lock my solution to a particular STL implementation, and for a cross-platform solution, that's not good. – Head Geek Jan 15 '11 at 01:16
  • 4
    @Tomalak: So needing anything that does not exist in the standard library is a "design smell"? – zvrba May 13 '11 at 10:45
  • @zvrba: That's not at all what I said. – Lightness Races in Orbit May 13 '11 at 13:34

10 Answers10

5

This is my answer of other question considering similar issue.

associative / random access container

I guess this might also apply to your question.


I have been looking for such a data structure for a long time.

Recently, I found quite promising library which has all the functionality that you are looking for.

See the cntree::set with random access in O(log n).

here is the link. http://dl.dropbox.com/u/8437476/works/countertree/index.html

Although it seems to be under development, I see it is quite usable.

Community
  • 1
  • 1
Sungmin
  • 2,499
  • 3
  • 26
  • 32
  • That sounds like exactly what I needed. Too bad it's a year and a half too late. :-) I ended up writing my own, which works very well indeed. I'm going to mark this as the accepted answer, since it's the only one that really answered the question, late or not. – Head Geek Jun 18 '12 at 21:58
  • 1
    Second link is dead. – luizfls Aug 10 '20 at 04:38
4

If you used a modified Trie where non-terminal nodes kept track of how many terminal nodes were below it, you could do quick ordered lookup.

Null Set
  • 5,374
  • 24
  • 37
2

I've never used boost::multi_index_container<>, but it sounds like it might have the capability to do what you want (though I'm not really sure - it's a pretty complex library at first glance).

It has a random access key type, but I'm not sure how you'd update the random index in a way that keeps inserted element's index synchronized with the other index's ordering. Also, note the following from the tutorial on using a random index:

This added flexibility comes at a price: insertions and deletions at positions other than the end of the index have linear complexity, whereas these operations are constant time for sequenced indices. This situation is reminiscent of the differences in complexity behavior between std::list and std::vector: in the case of random access indices, however, insertions and deletions never incur any element copying, so the actual performance of these operations can be acceptable, despite the theoretical disadvantage with respect to sequenced indices.

It's unclear to me whether that would be a deal killer for you or not, even if you can manage to synchronize the random index for inserted elements the way you'd like.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    Thanks. That was one of the first things I considered, but from the documentation (including the part you quoted), it looks like it uses an actual vector internally, which would destroy the insertion/deletion performance. – Head Geek Jan 14 '11 at 02:06
  • A random access index presents elements in insertion order, not ordered by a Compare – Caleth Aug 11 '20 at 16:40
1

Late to the party (hit this question while looking for something related) - but wouldn't a sorted vector fit the use case here? Insertion time is worse - unless you do most/ all of your insertions in one batch before sorting. After that look-up time can actually beat std::map - and getting the index is trivial.

philsquared
  • 22,403
  • 12
  • 69
  • 98
  • The insertion time is a deal-breaker. It needs to insert things regularly, and it can only batch them to a limited degree. – Head Geek May 13 '11 at 23:11
  • I like this answer because it highlights an important point one should definitely consider before committing to more tricky solutions! – AndreyS Scherbakov Feb 09 '20 at 03:44
0

One option would be to develop a container that is based on the std::vector, but also has the map interface. It would store a separate hashtable or binary tree that uses the elements' keys to access them, but the actual values would be pointers into the internal array used by the vector.

Such a monstrosity may seem pointless, error-prone, or a design smell by some people, but such a data structure does have its place. I have seen this used in code for hardware drivers in retail systems, where two users of a container need to access it different ways. When used "because it is there" it is a bad thing, but it is a lifesaver when used correctly.

  • 2
    That doesn't seem as time-efficient as the map-with-child-counts design I described, at least in this case. – Head Geek Jan 15 '11 at 01:11
0

Silly idea: Also keep keys in indexed skip list with O(log n) insertion and random access. See http://cglab.ca/~morin/teaching/5408/refs/p90b.pdf

This is not any more efficient than using order statistic trees and requires more space and bookkeeping.

See also: Rank Tree in C++

qwr
  • 9,525
  • 5
  • 58
  • 102
0

Check out boost::container::flat_map[1] which is a map based on a vector-like container[2].

With the underlying random-access iterators, you get O(1) lookup; the nth[3] member function helps you get an iterator to the nth element.


[1] Boost.Container
[2] Documentation on flat_map
[3] boost::container::flat_map::nth

luizfls
  • 532
  • 6
  • 13
0

You can also use a policy-based data structure if using g++

Example:

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <functional> // for less

using namespace __gnu_pbds;
using namespace std;

typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> pbds;

int main() {
    pbds a;
    a.insert(4);
    a.insert(5);
    a.insert(1);
    
    // value at index 1
    cout << a.find_by_order(1)->second << endl; // prints 4
    
    // index of number 5
    cout << p.order_of_key(5) << endl;

    return 0;
}

Both the functions order_of_key and find_by_order work in logarithmic time.

Source: Geeks from Geeks The g++ compiler also supports some data structures that are not part of the C++ standard library. Such structures are called policy-based data structures. These data structures are designed for high-performance, flexibility, semantic safety, and conformance to the corresponding containers in std.

Vishal Singh
  • 1,341
  • 14
  • 13
-2

Try using an ordered std::list and use std::binary_search for searching. An ordered list can be implemented using std::list and inserting nodes using std::lower_bound. There are many examples of this on the web, and on SO.

  • 3
    I could be wrong about this, but don't these algorithms degrade to O(n) performance on forward, non-random-access iterators like list iterators? See the complexity section at http://cplusplus.com/reference/algorithm/binary_search/ – templatetypedef Jan 14 '11 at 01:23
  • 1
    -1 Lists are slow for binary search as the iterators are not random access. It will have to dereference every node on the way to item 'n' so you will lose a LOT of speed as it will continually be jumping around memory and thrashing the cache. – Grant Peters Jan 14 '11 at 01:25
  • 4
    Sorry about the rude welcome to the community, but this is the way of stack overflow. If something is bad, it gets downvoted, no matter the intentions. Code quality reigns supreme here, any less would be detrimental to the community. Please don't take this as a criticism, but a chance to learn something new. Post again, we can only learn from our mistakes. I wear my bad answers/questions as a mark of pride! I am not infallible which means I always have something to improve (if this wasn't the case, I think I'd change to a totally different career). – Grant Peters Jan 14 '11 at 16:58
-2

MS VC STL map backed by red black tree.

I don't think it is possible to have efficient search (by key) and efficient random access in the same data structure.

If the efficient random access is really important it would be better to store data in vector-like random access container. Ordering and key searching can be accomplished with additional indexes. RDBMSs are doing like this.

Or, if the insertion/deletion is more important it seems avoidable to manage something like key array (or row number index) for random accesses.

9dan
  • 4,222
  • 2
  • 29
  • 44
  • As far as I can tell, it's *all* important. And yes, it's possible -- I can see exactly how to do it, I just really don't want to spend the time to write it myself. – Head Geek Jan 14 '11 at 13:00