16

I want to store a bunch of key-value objects, but where the value object itself (and references to it) knows its key. I also want to efficiently lookup these objects given only the key.

class SomeObject
{
private:
    //String or integer. int seem cheap enough to duplicate with std::map, but
    //strings seem pretty expensive when there may be thousands of objects in existence.
    //Reference/Pointer to key is fine
    const SomeOtherObject key;
    ...other stuff...
public:
    ...methods, some of which use the key in some way...
};
  • std::map
    • Seems to require that the storage is an std::pair, such that the value cant access the key. If the value contains the key, it needs to be duplicated.
    • Does not actually enforce that the key inside the value does not get changed in some way
  • std::set
    • Looks like a really good solution, using a custom compare method to provide uniqueness by key, until you realise it made your entire value const, not just the key field.
  • std::vector (or other array/list like solutions)
    • Can use linear search, or if the items are kept sorted binary search. However I suspect this not not optimal in performance terms, and an extra layer of some kind is needed to really implement the desired behaviour with it.
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Fire Lancer
  • 29,364
  • 31
  • 116
  • 182
  • 1
    Sounds to me like a `map` makes the most sense, but instead of duplicating the key in the value, just include a reference to the key in the value. – Jerry Coffin Dec 11 '12 at 20:33
  • 2
    Put the key in the map as a pointer so that data isn't duplicated, then use a custom compare to compare the data pointed to instead of the pointer itself? – goji Dec 11 '12 at 20:34
  • 2
    A sorted `std::vector` does come with some performance concerns, especially if you are doing a lot of inserting/removing from the container. If you are _not_ doing that though, it comes with a lot of benefits. Mainly, reduced memory footprint and better cache locality. – Chad Dec 11 '12 at 20:38
  • Troy method is feasible, but requires attention in keeping the "key" constant (otherwise the entire data structure risk to be messed up at each further insertion / removal). – Emilio Garavaglia Dec 11 '12 at 20:38
  • 1
    Why not use a `map`, but call the `pair` your "value"? It satisfies all of your requirements. – Beta Dec 11 '12 at 20:43
  • @Beta, can the this pointer in member functions get the pair? Or can i otherwise have a custom "pair" and put the member functions there? – Fire Lancer Dec 12 '12 at 21:23
  • @Troy Introducing another pointer would take 8B/ptr on (64-bit OS). Jerry's solution, using references, will be "soft-coded" by compiler and won't make obj bigger, because reference is not an object like pointer (obj storing a memory address). Other solutions to avoid duplicating the data would be * to pass value's key to value's method whenever they are called. * to hold current key in private: static int SomeObject::currKey. It's like global, but if many methods will be called, it's faster than passing. For multithreading: store current-key/thread. Can't do if obj accesses other objects – slyy2048 Sep 08 '16 at 11:57
  • @JerryCoffin I think no one followed your suggestion which I think is great - but how do you advise to get a reference to the key and place in the value? Would you set it right after insertion? (I think that would be OK since pointers and references don't change in the lifetime of the map). It would have to be a pointer and not reference right? – haelix Mar 07 '19 at 19:53
  • @haelix: I guess it would depend a bit on the exact sort of thing involved, but I'd probably set the pointer/reference in the object's ctor. – Jerry Coffin Mar 07 '19 at 20:21
  • @JerryCoffin in the c-tor - you really can't, right? because that's executed as part of the c-tor of pair, which is executed as part of `emplace()`. This is what's unclear to me. – haelix Mar 07 '19 at 23:23

4 Answers4

9

C++14 std::set::find non-key searches

As mentioned at http://en.cppreference.com/w/cpp/container/set/find C++14 has added two new find APIs:

main.cpp

template< class K > iterator find( const K& x );
template< class K > const_iterator find( const K& x ) const;

which allow you to do:

main.cpp

#include <cassert>
#include <set>

class Point {
    public:
        // Note that there is _no_ conversion constructor,
        // everything is done at the template level without
        // intermediate object creation.
        //Point(int x) : x(x) {}
        Point(int x, int y) : x(x), y(y) {}
        int x;
        int y;
};
bool operator<(const Point& c, int x) { return c.x < x; }
bool operator<(int x, const Point& c) { return x < c.x; }
bool operator<(const Point& c, const Point& d) {
    return c.x < d;
}

int main() {
    // std::less<> because of:
    // https://stackoverflow.com/questions/20317413/what-are-transparent-comparators
    std::set<Point, std::less<>> s;
    s.insert(Point(1, -1));
    s.insert(Point(2, -2));
    s.insert(Point(0,  0));
    s.insert(Point(3, -3));
    assert(s.find(0)->y ==  0);
    assert(s.find(1)->y == -1);
    assert(s.find(2)->y == -2);
    assert(s.find(3)->y == -3);
    // Ignore 1234, find 1.
    assert(s.find(Point(1, 1234))->y == -1);
}

Tested on Ubuntu 16.10, g++ 6.2.0, with:

g++ -std=c++14 -Wall -Wextra -pedantic -o main.out main.cpp
./main.out

Using a custom class instead of less<>

This makes things a bit more explicit and allows you to write multiple comparators per class:

#include <cassert>
#include <set>

class Point {
    public:
        Point(int x, int y) : x(x), y(y) {}
        int x;
        int y;
};

struct PointCmpY {
    // https://stackoverflow.com/questions/20317413/what-are-transparent-comparators
    typedef std::true_type is_transparent;
    bool operator()(const Point& lhs, int rhs) const {
        return lhs.y < rhs;
    }
    bool operator()(int lhs, const Point& rhs) const {
        return lhs < rhs.y;
    }
    bool operator()(const Point& lhs, const Point& rhs) const {
        return lhs.y < rhs.y;
    }
};

int main() {
    std::set<Point, PointCmpY> s;
    s.insert(Point(1, -1));
    s.insert(Point(2, -2));
    s.insert(Point(0,  0));
    s.insert(Point(3, -3));
    assert(s.find(0)->x == 0);
    assert(s.find(-1)->x == 1);
    assert(s.find(-2)->x == 2);
    assert(s.find(-3)->x == 3);
    assert(s.find(Point(1234, -1))->x == 1);
}

See also

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • 2
    This should be the accepted answer, it works like charm. I have two questions: 1) On `bool operator<(const MemberKey& c, const MemberKey& d) {return c.key < d;}`, why not `c.key < d.key` ? I know both work, but I don't know why. 2) How to convert all this to use private members? I get lots of errors when I try ... – jav Dec 19 '17 at 10:36
  • @jav 1) my rationale was that `c.key < d` repeats the code a bit less, since the `.key` part in `d.key` specifies one more time what the key is. 2) For private fields, you would likely use a getter `.getKey()` instead of `.key`. If you don't have such getter, then it feels a bit wrong, as it would mean that the set would know about internal hidden structure of the class. Or, you can also overload `<` as a member function I think, and then it can access privates: http://en.cppreference.com/w/cpp/language/operator_comparison but this is generally frowned upon for exposing private info again. – Ciro Santilli OurBigBook.com Dec 19 '17 at 10:58
  • thanks for the quick reply 1) Actually I was wondering how does the compiler know whether to compare to `d.key` or to `d.notkey` if it's not specified, but don't worry I will try to understand it later.. 2) I tried overloading `<` as member function but this only allows checking `c.key < key` (`this` implicitly is used as first parameter), so to check `key < c.key`you need a free function and make it friend. Thanks a lot for this, you have helped me a lot.. – jav Dec 19 '17 at 15:11
  • @jav OK! 1) I think it is the `std::less<>` that makes the `set` use `<` for comparision 2) Ah, you are right, you would need a friend method. – Ciro Santilli OurBigBook.com Dec 19 '17 at 15:23
  • 1
    this can be dangerous as it allows to modify (accidentally) the key: `s.find(3)->x = 0;` -> the set is corrupted – Andriy Tylychko Feb 09 '18 at 14:18
  • @AndriyTylychko how to prevent that – Ciro Santilli OurBigBook.com Feb 09 '18 at 14:22
  • to hide the key behind a getter usually – Andriy Tylychko Feb 09 '18 at 14:27
  • 1
    @AndriyTylychko: Actually, `std::set::iterator` is defined to be a *constant* iterator type, so `s.find(3)->x` is of type `int const&` and will therefore not permit assignment. – Matt Whitlock Jul 25 '18 at 04:04
  • 1
    The problem with this is that it doesn't allow to modify the non-key part of the elements. It might be what the OP needed, or might be not. – Yakov Galka Mar 11 '19 at 17:06
  • @ybungalobill yes, that is correct. I wonder if you can ever do anything better performance-wise though than removing and re-inserting when the key of a map is modified. Interface-wise we could hide it with a dedicated map wrapper with setters and getters I suppose. – Ciro Santilli OurBigBook.com Mar 11 '19 at 17:36
  • 1
    @CiroSantilli新疆改造中心六四事件法轮功: Not with the standard containers, but you can with Boost.Intrusive. In fact I find myself using it just as frequently, if not more frequently, than the standard ones. It also simplifies exceptions handling greatly, cause only the object allocation can fail but not the insertions and deletions from the (multiple) containers. – Yakov Galka Mar 11 '19 at 17:59
2

I feel your pain. What makes me mad is that set and map are always implemented using the same data-structure under the hood, which is a tree of values parametrized with a key extractor. Unfortunately there is no such thing in the standard.

If boost is OK, use Boost.MultiIndex to achieve what you need. Take a look at Boost.Intrusive too.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
2

C++ provides the mutable keyword that would allow you using the second solution -- a set. Declaring your value as mutable in your item class will allow modifying it even if the item is const. See also: Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?

Or, even simpler, implement an accessor for your value that const_casts away the constant-ness of the item.

Community
  • 1
  • 1
krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • Using the `mutable` keyword is definitely better than `const_cast`ing and hoping, that things will work correctly. – Kai Petzke Mar 20 '21 at 07:15
1

... but where the value object itself (and references to it) knows its key

Map:

The object can't know 'its' key, since a pointer to the same object may be added to several maps, using different keys. The key belongs to a map; not to an object.

Set:

What should happen when the value of this member changes? How would you force a reindexing? This is why set enforces constness.

--

You are trying to index items of a given class based on one of its members, but you don't want to copy this member for indexing purposes, and you don't want to make the object const (I assume that you do want to make the member const).

I would have built it on top of a Red-Black or AVL tree.

I'm not familiar enough with Boost.MultiIndex suggested by ybungalobill. I'm not sure whether its instantiated code copies the indexed member, or how it handles values changes for this member.

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • Well isnt that basically what map/set come down to in most implementations? Is there truly no way to extract this behaviour out of them? – Fire Lancer Dec 12 '12 at 21:47
  • @FireLancer: AFAIK, you can't do it. Using STL's internal tree implementation is not practical either (at least not in a portable way - see http://stackoverflow.com/questions/11381157/using-stls-internal-implementation-of-red-black-tree). – Lior Kogan Dec 13 '12 at 07:12