Create multiple indexes into a large collection of objects with smart pointers

Question

I am creating multiple indexes (ie, that use different keys) into a large collection of objects. The objects can change, and the collection can shrink and grow. My thoughts so far:

Keep multiple collections of some kind of pointers to the objects. Use set instead of map for better encapsulation. Use unordered_set to scale well with large data sets. The pointers should ideally all be some form of smart pointer.

I can start off fairly easily with a master collection of unique_ptrs, which manage all allocations, and secondary indexes that use "raw" pointers (I'll leave out the supporting functions for the moment, but note that the index is a multiset as its key won't be unique across the set):

typedef boost::unordered_set< boost::unique_ptr<MyObject>,myobject_hash,myobjects_equal > MyObjects;
typedef boost::unordered_multiset<const MyObject*,myobject_index2_hash,myobject_index2_equal > MyObjectsIndex2;

Usage is straightforward:

MyObjects my_objects;
MyObjectsIndex2 my_objects_index2;

auto it_mo = my_objects.insert(
    boost::unique_ptr<MyObject>(
        new MyObject(...)
    )
);
const MyObject* p_mo = it_mo.first->get();
my_objects_index2.insert(p_mo);

I am considering putting in extra effort to replace the index's use of raw pointers with const references to the unique_ptrs of the master collection. I'm not sure I can though, at least not easily. I thought I would ask if anyone else had gone that route already, or had alternative suggestions.

UPDATE

Lessons learned so far:

Datastore classes are cool
reference_wrappers are cool
A xx_set with a "key" object datastore member var is more space-efficient than xx_map. BUT... you can't easily use a unique_ptr as a key in c++11. c++14 apparently may have better set functionality via std::set<Key>::find. See here for more details. So for now, a datastore that manages raw allocations seems to make more sense here than trying to force use of a unique_ptr as a set key, or increasing keyspace storage with maps.
Remember to force key values to be const for the life of the object (use const values provided in constructor)

References to the unique pointers won't work because those pointers will tend to move around as the main storage structure reallocates as it grows (unless it is somehow written to keep objects in place). You can't really get around needing a second layer of indirection here unless you're willing to store multiple copies of the objects, which wastes a lot of space. Anyway, I think you would be better off with a shared pointer in the main store, and weak pointers in the indices. — Wug, May 08 '15 at 21:23
Thanks Wug, good point about the moving unique_ptrs, that would be another detail to manage. And yes the shared + weak pointer approach would work but with the expense of reference counting. Which isn't bad. I'm kind of obsessing about this because it's so fundamental. The raw pointer approach seems ok I think... still mulling it over in my head... — moodboom, May 08 '15 at 21:38

score 1 · Answer 1 · answered May 08 '15 at 22:27

Here's one way.

std::vector<unique_ptr> to hold the data items (to ensure that the addresses don't change when the vector resizes) and then containers holding reference_wrappers (copyable references) to make the indexes.

compilable example:

#include <map>
#include <vector>
#include <set>
#include <string>
#include <functional>
#include <memory>
#include <iostream>

struct Thing {
    Thing(std::string name, int value)
    : _name { std::move(name) }
    , _value { value }
    {}

    const std::string& name() const {
        return _name;
    }

    void write(std::ostream& os) const {
        os << "{ " << _name << " : " << _value << " }";
    }    
private:
    std::string _name;
    int _value;
};

inline std::ostream& operator<<(std::ostream& os, const Thing& t) {
    t.write(os);
    return os;
}

struct multi_index
{
    using multi_by_name_index = std::multimap<std::string, std::reference_wrapper<Thing>>;

    void add_thing(std::string name, int value) {

        // todo: checks to ensure that indexes won't be violated

        // add a new thing to the main store
        _main_store.emplace_back(new Thing{std::move(name), value});

        // store a reference to it in each index
        auto& new_thing = *(_main_store.back().get());
        _name_index.emplace(new_thing.name(), new_thing);
    }

    using multi_by_name_range = std::pair<multi_by_name_index::const_iterator, multi_by_name_index::const_iterator>;
    multi_by_name_range get_all_by_name(const std::string name) const
    {
        return _name_index.equal_range(name);
    }

private:
    std::vector<std::unique_ptr<Thing>> _main_store;
    std::multimap<std::string, std::reference_wrapper<Thing>> _name_index;
};

using namespace std;

int main()
{
    multi_index mi;

    mi.add_thing("bob", 8);
    mi.add_thing("ann", 4);
    mi.add_thing("bob", 6);

    auto range = mi.get_all_by_name("bob");
    for( ; range.first != range.second ; ++range.first) {
        cout << range.first->second << endl;
    }

   return 0;
}

expected output:

{ bob : 8 }                                                                                                                             
{ bob : 6 }

There is `std::unordered_multimap` as well, which is probably even better for this. Also, given that the `std::unique_ptr`s are entirely hidden from the callers, is there any point using them? Why not just use, for example, `std::unordered_set`, instead of `std::vector>`? — Wug, May 08 '15 at 23:16
Unordered: yes that's fine. You just have to make sure there's an operator== and a hash function. Set as main store? That would mean that Thing would need to be unique and would have to provide a < operator. I didn't want to constrain it in that way. — Richard Hodges, May 08 '15 at 23:25
Thanks for the reference_wrappers tip, cool! I suppose that's the best way to "show" that the allocation is handled elsewhere. Do you know if there is any overhead to a reference_wrapper over using a raw pointer? — moodboom, May 09 '15 at 02:30
You're welcome. No, there's no overhead at all. Internally it's a pointer. The reference wrapper gives it the interface of a reference so it behaves well with std algorithms etc — Richard Hodges, May 09 '15 at 07:37

score 1 · Answer 2 · answered May 08 '15 at 23:04

I recognize that your use case is probably different from the one I contrived for my example, and without a lot more details I won't be able to make one that matches (I also think that if you had a lot of details you would be able to find a solution yourself).

#include <iostream>
#include <map>
#include <set>
#include <memory>
#include <stdexcept>

using namespace std;

class Thing
{
public:
    Thing() = default;
    Thing(const Thing &other) = default;
    Thing(int i, string p, string d) : id(i), desc(d), part(p) {}

    int    id;
    string desc;
    string part;
};

ostream &operator<<(ostream &out, const Thing &t)
{
    if (&t == NULL) out << "(NULL)"; // don't judge me
    else out << t.id << ": " << t.part << " (" << t.desc << ")";
}

class Datastore
{
public:
    Datastore() = default;
    shared_ptr<const Thing> Add(const Thing &t)
    {
        if (!(index_bydesc.find(t.desc) == index_bydesc.end() &&
              index_bypart.find(t.part) == index_bypart.end() &&
              index_byid.find(t.id) == index_byid.end()))
            throw runtime_error("Non-unique insert");
        shared_ptr<const Thing> newt = make_shared<const Thing>(t);
        weak_ptr<const Thing> weak = weak_ptr<const Thing>(newt);
        index_bydesc[newt->desc] = weak;
        index_bypart[newt->part] = weak;
        index_byid[newt->id] = weak;
        store.insert(newt);
        return newt;
    }

    void Remove(const Thing &t)
    {
        shared_ptr<const Thing> p = FindBy_Desc(t.desc);
        store.erase(p);
        index_bydesc.erase(p->desc);
        index_bypart.erase(p->part);
        index_byid.erase(p->id);
    }

    shared_ptr<const Thing> FindBy_Desc(string desc)
    {
        map<string, weak_ptr<const Thing> >::iterator iter = index_bydesc.find(desc);
        if (iter == index_bydesc.end()) return shared_ptr<const Thing>();
        return iter->second.lock();
    }

    // index accessors for part and quantity omitted

private:
    std::set<shared_ptr<const Thing> > store;

    std::map<string, weak_ptr<const Thing> > index_bydesc;
    std::map<string, weak_ptr<const Thing> > index_bypart;
    std::map<int, weak_ptr<const Thing> > index_byid;
};

int main() {
    Datastore d;
    d.Add(Thing(1, "TRNS-A", "Automatic transmission"));
    d.Add(Thing(2, "SPKPLG", "Spark plugs"));
    d.Add(Thing(3, "HOSE-S", "Small hoses"));
    d.Add(Thing(4, "HOSE-L", "Large hoses"));
    d.Add(Thing(5, "BATT-P", "Primary battery (14.5v nominal)"));
    d.Add(Thing(6, "BATT-S", "Secondary batteries (1.5v nominal)"));
    d.Add(Thing(7, "CRKSFT", "Crank shaft"));
    d.Add(Thing(8, "REAC-F", "Fusion reactor power source"));

    cout << *d.FindBy_Desc("Crank shaft") << endl;
    d.Remove(*d.FindBy_Desc("Crank shaft"));
    cout << *d.FindBy_Desc("Crank shaft") << endl;
    return 0;
}

Shortcomings:

Storage structure is read-only. This is a necessary shortcoming, because the index will become outdated if you modify the indexed fields of an object while it's in the datastore. To modify an object, remove it and then re-add another one.
All fields must be unique. This is easily changeable, but you need to keep maps containing list<Thing> as indices for non-unique fields, not just maps containing Thing.
Performance problems relating to using std::map. std::unordered_map is an alternative with better (constant amortized) access times for huge data structures (the same as std::unordered_set).

Deviations:

Given that you have a clear key-value relationship here, I think you'd be better off with a map than a set.
In order to solve performance problems relating to reference counting, if you are careful to maintain the internal consistency at all times, you could forgo all smart pointers for raw ones, and return values via references, and you could achieve further improvements by using unsafe object ownership semantics when filling it (i.e., pass it pointers to heap objects that the Datastore then takes ownership of). More complex, but ultimately fewer copies and less runtime overhead.

Thanks Wug, that's nice and clean! I'm definitely going to use your object store pattern. Now I have to decide - raw pointers, or reference counting. :-) — moodboom, May 09 '15 at 02:11

Create multiple indexes into a large collection of objects with smart pointers

2 Answers2

Shortcomings:

Deviations:

Linked