boost multi_index_container, range mutating algorithms and constness

Question

I'm using boost multi_index_container, which is queried by equal_range and the result returned from the function using range::join and boost::any_range
The any_range Reference argument is defined as const reference to the type - must be const because of the multi_index_container nature, not quite sure about the reference. Example:

typedef boost::any_range<TestData, boost::random_access_traversal_tag,
                         const TestData &, std::ptrdiff_t> TestRange;

Now what I need is to use mutating range algorithms like boost::sort, unique, etc., which obviously cannot run on range because of the constness of elements in range.
Is it any workaround except copying elements into new container?

EDIT 1:
struct and MIC example:

struct TestData {
  TestData()
      : m_strMem01("test"), m_intMem02(rand()),
        m_boolMem03(rand() < RAND_MAX / 2) {}
  std::string m_strMem01;
  int m_intMem02;
  bool m_boolMem03;
};

typedef boost::multi_index_container<
    TestData,
    bmi::indexed_by<
        bmi::random_access<bmi::tag<struct RndKey1>>,
        bmi::ordered_non_unique<
            bmi::tag<struct Key1>,
            bmi::composite_key<
                TestData,
                bmi::member<TestData, std::string, &TestData::m_strMem01>,
                bmi::member<TestData, bool, &TestData::m_boolMem03>>>,
        bmi::ordered_non_unique<
            bmi::tag<struct Key4>,
            bmi::composite_key<
                TestData,
                bmi::member<TestData, std::string, &TestData::m_strMem01>,
                bmi::member<TestData, bool, &TestData::m_intMem02>>>,
        bmi::ordered_non_unique<
            bmi::tag<struct Key2>,
            bmi::member<TestData, int, &TestData::m_intMem02>>,
        bmi::ordered_non_unique<
            bmi::tag<struct Key3>,
            bmi::member<TestData, bool, &TestData::m_boolMem03>>>>
    TestDataContainer;

You can't really sort a MIC. But the MIC will keep itself sorted, that's what you are paying for in the first place :) (You could rearrange a RA index.) — sehe, Dec 11 '14 at 11:28

Joaquín M López Muñoz · Accepted Answer · 2014-12-10T20:14:27.830

2

OK, once you have your range you really can't sort it or somehow rearrange it because the order of elements is fixed by the index --this is an absolutely fundamental invariant of indices enforced by the constness of elements, much as you'd find with say a std::set. Rather than doing a full copy to an external container you can create a more lightweight view out of pointers or references to the original elements that can later be manipulated as you wish. This is an example with views constructed as std::vectors of std::reference_wrappers to the elements:

Live On Coliru

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/range/algorithm/sort.hpp>
#include <boost/range/join.hpp>
#include <functional>
#include <iostream>
#include <vector>

using namespace boost::multi_index;

struct X
{
  int x,y;
};

std::ostream& operator<<(std::ostream& os,const X& a)
{
  return os<<"{"<<a.x<<","<<a.y<<"}";
}

typedef multi_index_container<
  X,
  indexed_by<
    ordered_non_unique<member<X,int,&X::x>>
  >
> multi_t;

struct view:std::vector<std::reference_wrapper<const X>>
{
  using base=std::vector<std::reference_wrapper<const X>>;

  template<typename InputIterator>
  view(InputIterator first,InputIterator last):base(first,last){}

  template<typename InputIterator>
  view(const std::pair<InputIterator,InputIterator> p):base(p.first,p.second){}  
};

int main()
{
  multi_t m={{0,1},{0,0},{0,2},{1,3},{1,1},{2,0},{2,1}};

  view v1=m.equal_range(0); // elements with x==0
  view v2=m.equal_range(2); // elements with x==2
  auto v3=boost::range::join(v1,v2); // join them
  boost::range::sort(v3,[](const X& a,const X& b){return a.y<b.y;}); // sort them by y
  for(const auto& x:v3)std::cout<<x<<" "; // output
}

Output:

{0,0} {2,0} {0,1} {2,1} {0,2}

edited Dec 10 '14 at 20:14

answered Dec 10 '14 at 20:07

Joaquín M López Muñoz

5,243
1
15
20

Hi, good to see you here :) I was aware of the constness, it was mentioned in MIC documentation, however I want to keep my right to complain :). Jokes aside, I'm not that familiar with reference_wrapper, I guess under the hood it is a container of pointers? So, as I mentioned before I always have this option, but again creating each time a container of 1M pointers is not the most elegant and performance oriented solution I'm looking for – kreuzerkrieg Dec 10 '14 at 20:28
what if... In my original question I mentioned sort and unique, what if I provide the user with additional indices? once the user knows how he wants the result to be sorted or uniqued (and it is finite number of options) he should query the MIC once again and get the range sorted or uniqued (or both) the way he wants. and lets say it will be 10 additional indices could you predict memory consumption/ performance impact of the whole MIC structure? – kreuzerkrieg Dec 10 '14 at 20:44
1

`reference_wrapper` is, yes, a pointer in disguise. Having 1M pointers is the price you hace to pay for doing arbitrary manipulations of the data, and I don't see how you can dispense with it. An alternative is having an extra index of type `random_access` and then transfering the arrangement of your view to that index via `rearrange`, as explained in http://www.boost.org/libs/multi_index/doc/tutorial/indices.html#rearrange . As for having ten additional indices, I don't really get your point and the overhead is in any case bigger than the 1M pointers of view-based approaches. – Joaquín M López Muñoz Dec 10 '14 at 21:35
If I get you right, rearrange is not an option, once the data is loaded and arranged it should be const and read-only since it is accessed from multiple threads, rearranging will introduce synchronization mechanism into the solution, which IMO will have unacceptable performance impact. As for additional indices see the question edit. Lets say you querying the MIC by Key1 and then you need it sorted by the m_intMem02, since you cant sort your results you just query the MIC once again by Key4. – kreuzerkrieg Dec 11 '14 at 05:04
In any case, since you mentioned that the overhead is bigger than view-based solution I accept your proposal as answer. Thanks! – kreuzerkrieg Dec 11 '14 at 05:06
`rearrange` does not introduce any synchronization mechanism per se: as any other mutating operations (like `insert`, for instance), it is up to the user to make sure execution is done in a thread-safe fashion. – Joaquín M López Muñoz Dec 11 '14 at 06:46
yes, this is what I mean - I will have to introduce some locking on rearranging. – kreuzerkrieg Dec 11 '14 at 07:16

score 1 · Answer 2 · answered Dec 10 '14 at 17:53

1

It sounds as if you don't really want your data in a multi_index container at all. The multi_index container (as it's name hints) is designed to store immutable data while maintaining several indexes (or views with constraints) against that data.

You can make part of your data mutable if you want to be able to update it, but you must ensure that the mutable parts do not participate in index construction.

If what you really want to do is mutate the data and produce new indexes as a result then you have 2 options - either remove and reinsert or (better) don't use boost::multi_index.

One approach would be to use a vector and maintain your own ad-hoc indexes.

for example, an index of unique entries in a vector of string (to keep it simple):

vector<const string*> unique_stable_index(const vector<string>& data)
{
    struct less_ptr {
        bool operator()(const string* l, const string* r) const {
            return *l < *r;
        }
    };

    vector<const string*> result;
    set<const string*, less_ptr> seen;
    for (const auto& s : data) {
        if(seen.insert(&s).second)
            result.push_back(&s);
    }

    return result;
}

answered Dec 10 '14 at 17:53

Richard Hodges

68,278
7
90
142

So maybe I wasn't clear enough, I do want the data immutable, and ad hoc is something was considered but found inferior in performance to the MIC. Back to the constness, mutable algorithms not really mutating data in terms of internal invariance of the class instance, simply they must dereference and assign, that's why they mutable, in other terms the data stays 'const' – kreuzerkrieg Dec 10 '14 at 19:02
1

Sorting and making unique views are logically indistinct from creating new indexes - the views are themselves indexes. You can avoid copies by copying references (pointers or iterators) into a container and provide a custom comparator to perform the dereference prior to compare (as above). I would be surprised if multi-index is 'faster' than maintaining separate indexes - certainly more convenient but it seems to me that multi-index is only covering part of your requirements. What am I missing? – Richard Hodges Dec 10 '14 at 19:14
as for 'faster', 'slower', check this - http://stackoverflow.com/questions/26825538/lookup-on-nested-boost-multi-index-container – kreuzerkrieg Dec 10 '14 at 19:24
as for covering requirements, there are two parts, the infra, which holds the data and provides 'queried' data to the business logic user. then the business logic developer needs to manipulate the data according to his needs, unfortunately, these needs have to manipulate data, like filtering out unneeded data which requires mutability. copying pointers was considered as last resort - we are talking about 1M items and I should do the best to provide microsecond level latency on data manipulation – kreuzerkrieg Dec 10 '14 at 19:32
I don't know how big your data is. If it's not huge, copying data is often the fastest solution in an SMP environment. Another approach might be to filter the data while building the multi-index container. i.e. rather than build the container and then filter it, filter items while they are being stored in it. This removes the need for copies of anything (data can be `move`d into a multi-index as of boost 1.57 AFAIR) – Richard Hodges Dec 10 '14 at 19:43
as I mentioned above they are aiming 1M items. that's it, I cant filter it beforehand, it acts more like sql quering, I have no idea what they will request in next 6 month, MIC makes it convenient just to add a new index when the new requirement comes – kreuzerkrieg Dec 10 '14 at 19:46
:-) No i mean how big is each individual item in the collection of 1M items? In any case the filter-while-building approach is your only solution if you wish to avoid copies or vectors-of-references. It seems to me that you're doing the equivalent of a SELECT of a SELECT if you take any other approach. – Richard Hodges Dec 10 '14 at 19:49
quite big, it includes 3 string members one of which could be pretty long (URL) and dozen of int members, so copying these items will have disastrous impact on performance but I still have the possibility to have a container of pointers. – kreuzerkrieg Dec 10 '14 at 20:01

boost multi_index_container, range mutating algorithms and constness

2 Answers2

Linked