10

I know how to sort a vector of pairs, but how do you sort a pair of vectors? I can think of writing a custom "virtual" iterator over a pair of vectors and sorting that, but that seems quite complex. Is there an easier way? Is there one in C++03? I would like to use std::sort.

This problem arises when processing some data generated in hardware, where a pair of arrays makes more sense than array of pairs (since then there would be all kinds of stride and alignment problems). I realize that otherwise keeping a pair of vector instead of a vector of pairs would be a design flaw (the structure of arrays problem). I'm looking for a fast solution, copying the data to a vector of pairs and then back (I will return it to the HW to do more processing) is not an option.

Example:

keys   = {5, 2, 3, 1, 4}
values = {a, b, d, e, c}

and after sorting (by the first vector):

keys   = {1, 2, 3, 4, 5}
values = {e, b, d, c, a}

I refer to a "pair of vectors" as the pair of keys and values (stored as e.g. std::pair<std::vector<size_t>, std::vector<double> >). The vectors have the same length.

the swine
  • 10,713
  • 7
  • 58
  • 100
  • What do you mean by "sorting vector of pairs" ? Sorting by which criteria? Give an example. – Eugene Sh. Nov 20 '14 at 20:38
  • What does sorting a pair of vectors mean? – Daniel Nov 20 '14 at 20:39
  • 2
    @Daniel I think it means sorting both vectors so that the pairing remains identical between them, based on the values from one of the vectors. – Mark Ransom Nov 20 '14 at 20:42
  • @MarkRansom Yes, that is actually very well explained. – the swine Nov 20 '14 at 20:43
  • EDIT: Nevermind. Mark answered my question. – caps Nov 20 '14 at 20:43
  • 1
    @caps No, not independently. The output should be the same as if making a vector of `std::pair`, sorting that, and then splitting it again. Note that whether it is sorted by the first vector, the second vector or both, probably does not matter, as that is a task of a comparison operator. – the swine Nov 20 '14 at 20:44
  • Do you have access to C++11 or are you restricted to C++03? – caps Nov 20 '14 at 20:44
  • Is it necessary to sort them in-place? I'm thinking maybe you could make a vector of indices and sort that based on the `keys` entries. – Fred Larson Nov 20 '14 at 20:45
  • @FredLarson A vector of indices is an interesting idea, that can certainly work without the custom iterator. However, it would double the memory use and decrease memory traffic, making it not the fastest solution. It would be remarkably simple, though. – the swine Nov 20 '14 at 20:47
  • @caps If you can do it in C++11, that will work. I would merely prefer C++03. – the swine Nov 20 '14 at 20:47
  • Related, if not duplicate, question: http://stackoverflow.com/q/3909272/10077 – Fred Larson Nov 20 '14 at 20:54
  • See also: http://stackoverflow.com/q/236172/10077 – Fred Larson Nov 20 '14 at 20:55
  • 1
    @FredLarson both of these are OK, and is what I typically end up using, but you'd think there would be a more elegant solution that doesn't require allocating another full set of integers...a writable zip iterator would do the trick, but I think you'd have to roll your own. – IdeaHat Nov 20 '14 at 21:00
  • @FredLarson That's a very nice find, I obviously did not search using correct words, however accepted answers to both involve first sorting indices and then permuting, which more than doubles memory use and reduces speed. – the swine Nov 20 '14 at 21:00
  • 1
    I don't think there's an option for using `std::sort`. If you insist on sorting the values in-place, I think you'll have to implement your own sort algorithm. `std::sort` will let you plug in the comparison, but not the swap operation. – Fred Larson Nov 20 '14 at 21:03
  • @FredLarson couldn't he write his own `swap` method and use his own iterators? – caps Nov 20 '14 at 21:05
  • @caps: If you could come up with an iterator type to do it, you might not even need a separate comparator. I'd have to think about how to implement the iterator. It might be possible. – Fred Larson Nov 20 '14 at 21:08
  • 4
    @IdeaHat if `std::sort` exposed a functor for the swap as well as the comparison, it would be trivial. Unfortunately it doesn't. I wonder how hard it would be to copy the library implementation of `std::sort` and make that one change? – Mark Ransom Nov 20 '14 at 21:13
  • 1
    I played around with it some but I am not good enough to get something compiling. I think you might be able to pull it off with custom iterator that looks something like class custom_iterator : public std::iterator { private: typename std::vector::iterator T_itr; typename std::vector::iterator char_itr; typename std::vector::iterator end_char; //to store .end() in order to keep char_itr from iterating past it //etc. } – caps Nov 20 '14 at 22:14
  • Nobody suggested std::map?? – Neil Kirk Nov 20 '14 at 22:30
  • @NeilKirk That would be horribly slow and use lots of memory. Or maybe I don't see how would you use it? – the swine Nov 20 '14 at 22:51
  • How many elements do you have? – Neil Kirk Nov 20 '14 at 22:57
  • @NeilKirk Orders of gigabytes of `size_t` / `double`. Many. – the swine Nov 20 '14 at 23:19
  • Ouch. Could you use `vector>` instead? Then you can use a custom sort. – Neil Kirk Nov 20 '14 at 23:39
  • @NeilKirk No, I couldn't, that's the point of the question. Otherwise it would be trivial :). – the swine Nov 20 '14 at 23:41

4 Answers4

7

Let's make a sort/permute iterator, so that we can just say:

int  keys[] = {   5,   2,   3,   1,   4 };
char vals[] = { 'a', 'b', 'd', 'e', 'c' };

std::sort(make_dual_iter(begin(keys), begin(vals)), 
          make_dual_iter(end(keys), end(vals)));

// output
std::copy(begin(keys), end(keys), std::ostream_iterator<int> (std::cout << "\nKeys:\t",   "\t"));
std::copy(begin(vals), end(vals), std::ostream_iterator<char>(std::cout << "\nValues:\t", "\t"));

See it Live On Coliru, printing

Keys:   1   2   3   4   5   
Values: e   b   d   c   a   

Based on the idea here, I've implemented this:

namespace detail {
    template <class KI, class VI> struct helper { 
        using value_type = boost::tuple<typename std::iterator_traits<KI>::value_type, typename std::iterator_traits<VI>::value_type>;
        using ref_type   = boost::tuple<typename std::iterator_traits<KI>::reference,  typename std::iterator_traits<VI>::reference>; 

        using difference_type = typename std::iterator_traits<KI>::difference_type;
    };
}

template <typename KI, typename VI, typename H = typename detail::helper<KI, VI> > 
class dual_iter : public boost::iterator_facade<dual_iter<KI, VI>, // CRTP
    typename H::value_type, std::random_access_iterator_tag, typename H::ref_type, typename H::difference_type> 
{ 
public: 
    dual_iter() = default;
    dual_iter(KI ki, VI vi) : _ki(ki), _vi(vi) { } 

    KI _ki; 
    VI _vi; 

private: 
    friend class boost::iterator_core_access; 

    void increment() { ++_ki; ++_vi; } 
    void decrement() { --_ki; --_vi; } 

    bool equal(dual_iter const& other) const { return (_ki == other._ki); } 

    typename detail::helper<KI, VI>::ref_type dereference() const { 
        return (typename detail::helper<KI, VI>::ref_type(*_ki, *_vi)); 
    } 

    void advance(typename H::difference_type n) { _ki += n; _vi += n; } 
    typename H::difference_type distance_to(dual_iter const& other) const { return ( other._ki - _ki); } 
}; 

Now the factory function is simply:

template <class KI, class VI> 
    dual_iter<KI, VI> make_dual_iter(KI ki, VI vi) { return {ki, vi}; }

Note I've been a little lazy by using boost/tuples/tuple_comparison.hpp for the sorting. This could pose a problem with stable sort when multiple key values share the same value. However, in this case it's hard to define what is "stable" sort anyways, so I didn't think it important for now.

FULL LISTING

Live On Coliru

#include <boost/iterator/iterator_adaptor.hpp>
#include <boost/tuple/tuple_comparison.hpp>

namespace boost { namespace tuples {

    // MSVC might not require this
    template <typename T, typename U>
    inline void swap(boost::tuple<T&, U&> a, boost::tuple<T&, U&> b) noexcept {
        using std::swap;
        swap(boost::get<0>(a), boost::get<0>(b));
        swap(boost::get<1>(a), boost::get<1>(b));
    }

} }

namespace detail {
    template <class KI, class VI> struct helper { 
        using value_type = boost::tuple<typename std::iterator_traits<KI>::value_type, typename std::iterator_traits<VI>::value_type>;
        using ref_type   = boost::tuple<typename std::iterator_traits<KI>::reference,  typename std::iterator_traits<VI>::reference>; 

        using difference_type = typename std::iterator_traits<KI>::difference_type;
    };
}

template <typename KI, typename VI, typename H = typename detail::helper<KI, VI> > 
class dual_iter : public boost::iterator_facade<dual_iter<KI, VI>, // CRTP
    typename H::value_type, std::random_access_iterator_tag, typename H::ref_type, typename H::difference_type> 
{ 
public: 
    dual_iter() = default;
    dual_iter(KI ki, VI vi) : _ki(ki), _vi(vi) { } 

    KI _ki; 
    VI _vi; 

private: 
    friend class boost::iterator_core_access; 

    void increment() { ++_ki; ++_vi; } 
    void decrement() { --_ki; --_vi; } 

    bool equal(dual_iter const& other) const { return (_ki == other._ki); } 

    typename detail::helper<KI, VI>::ref_type dereference() const { 
        return (typename detail::helper<KI, VI>::ref_type(*_ki, *_vi)); 
    } 

    void advance(typename H::difference_type n) { _ki += n; _vi += n; } 
    typename H::difference_type distance_to(dual_iter const& other) const { return ( other._ki - _ki); } 
}; 

template <class KI, class VI> 
    dual_iter<KI, VI> make_dual_iter(KI ki, VI vi) { return {ki, vi}; }

#include <iostream>
using std::begin;
using std::end;

int main()
{
    int  keys[] = {   5,   2,   3,   1,   4 };
    char vals[] = { 'a', 'b', 'd', 'e', 'c' };

    std::sort(make_dual_iter(begin(keys), begin(vals)), 
              make_dual_iter(end(keys), end(vals)));

    std::copy(begin(keys), end(keys), std::ostream_iterator<int> (std::cout << "\nKeys:\t",   "\t"));
    std::copy(begin(vals), end(vals), std::ostream_iterator<char>(std::cout << "\nValues:\t", "\t"));
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 2
    That's fairly neat using Boost. Let me see if I get it straight, the `boost::iterator_facade` implements all the iterator functions for you, based only on `increment`, `decrement`, `equal`, `dereference`, `advance` and `distance_to`? If so, that's pretty useful, judging from how much longer my boost-less dual iterator is. The comparison is not a problem, I could easily go and write my own comparison predicate for `std::sort`. – the swine Nov 21 '14 at 08:17
  • I'd say "indeed" on all accounts :) – sehe Nov 21 '14 at 09:49
  • Dare to write a variadic template version of dual_iter? – dalle Sep 12 '15 at 19:33
  • 1
    @dalle sure: http://coliru.stacked-crooked.com/a/95de0569a933fde5 (I used c++14 so I don't have to write `[make_]index_sequence<>`. I think there's a lot of opportunity to further simplify if you want to really use c++14) – sehe Sep 12 '15 at 20:34
4

Just for comparison, this is how much code the split iterator approach requires:

template <class V0, class V1>
class CRefPair { // overrides copy semantics of std::pair
protected:
    V0 &m_v0;
    V1 &m_v1;

public:
    CRefPair(V0 &v0, V1 &v1)
        :m_v0(v0), m_v1(v1)
    {}

    void swap(CRefPair &other)
    {
        std::swap(m_v0, other.m_v0);
        std::swap(m_v1, other.m_v1);
    }

    operator std::pair<V0, V1>() const // both g++ and msvc sort requires this (to get a pivot)
    {
        return std::pair<V0, V1>(m_v0, m_v1);
    }

    CRefPair &operator =(std::pair<V0, V1> v) // both g++ and msvc sort requires this (for insertion sort)
    {
        m_v0 = v.first;
        m_v1 = v.second;
        return *this;
    }

    CRefPair &operator =(const CRefPair &other) // required by g++ (for _GLIBCXX_MOVE)
    {
        m_v0 = other.m_v0;
        m_v1 = other.m_v1;
        return *this;
    }
};

template <class V0, class V1>
inline bool operator <(std::pair<V0, V1> a, CRefPair<V0, V1> b) // required by both g++ and msvc
{
    return a < std::pair<V0, V1>(b); // default pairwise lexicographical comparison
}

template <class V0, class V1>
inline bool operator <(CRefPair<V0, V1> a, std::pair<V0, V1> b) // required by both g++ and msvc
{
    return std::pair<V0, V1>(a) < b; // default pairwise lexicographical comparison
}

template <class V0, class V1>
inline bool operator <(CRefPair<V0, V1> a, CRefPair<V0, V1> b) // required by both g++ and msvc
{
    return std::pair<V0, V1>(a) < std::pair<V0, V1>(b); // default pairwise lexicographical comparison
}

namespace std {

template <class V0, class V1>
inline void swap(CRefPair<V0, V1> &a, CRefPair<V0, V1> &b)
{
    a.swap(b);
}

} // ~std

template <class It0, class It1>
class CPairIterator : public std::random_access_iterator_tag {
public:
    typedef typename std::iterator_traits<It0>::value_type value_type0;
    typedef typename std::iterator_traits<It1>::value_type value_type1;
    typedef std::pair<value_type0, value_type1> value_type;
    typedef typename std::iterator_traits<It0>::difference_type difference_type;
    typedef /*typename std::iterator_traits<It0>::distance_type*/difference_type distance_type; // no distance_type in g++, only in msvc
    typedef typename std::iterator_traits<It0>::iterator_category iterator_category;
    typedef CRefPair<value_type0, value_type1> reference;
    typedef reference *pointer; // not so sure about this, probably can't be implemented in a meaningful way, won't be able to overload ->
    // keep the iterator traits happy

protected:
    It0 m_it0;
    It1 m_it1;

public:
    CPairIterator(const CPairIterator &r_other)
        :m_it0(r_other.m_it0), m_it1(r_other.m_it1)
    {}

    CPairIterator(It0 it0 = It0(), It1 it1 = It1())
        :m_it0(it0), m_it1(it1)
    {}

    reference operator *()
    {
        return reference(*m_it0, *m_it1);
    }

    value_type operator *() const
    {
        return value_type(*m_it0, *m_it1);
    }

    difference_type operator -(const CPairIterator &other) const
    {
        assert(m_it0 - other.m_it0 == m_it1 - other.m_it1);
        // the iterators always need to have the same position
        // (incomplete check but the best we can do without having also begin / end in either vector)

        return m_it0 - other.m_it0;
    }

    bool operator ==(const CPairIterator &other) const
    {
        assert(m_it0 - other.m_it0 == m_it1 - other.m_it1);
        return m_it0 == other.m_it0;
    }

    bool operator !=(const CPairIterator &other) const
    {
        return !(*this == other);
    }

    bool operator <(const CPairIterator &other) const
    {
        assert(m_it0 - other.m_it0 == m_it1 - other.m_it1);
        return m_it0 < other.m_it0;
    }

    bool operator >=(const CPairIterator &other) const
    {
        return !(*this < other);
    }

    bool operator <=(const CPairIterator &other) const
    {
        return !(other < *this);
    }

    bool operator >(const CPairIterator &other) const
    {
        return other < *this;
    }

    CPairIterator operator +(distance_type d) const
    {
        return CPairIterator(m_it0 + d, m_it1 + d);
    }

    CPairIterator operator -(distance_type d) const
    {
        return *this + -d;
    }

    CPairIterator &operator +=(distance_type d)
    {
        return *this = *this + d;
    }

    CPairIterator &operator -=(distance_type d)
    {
        return *this = *this + -d;
    }

    CPairIterator &operator ++()
    {
        return *this += 1;
    }

    CPairIterator &operator --()
    {
        return *this += -1;
    }

    CPairIterator operator ++(int) // msvc sort actually needs this, g++ does not
    {
        CPairIterator old = *this;
        ++ (*this);
        return old;
    }

    CPairIterator operator --(int)
    {
        CPairIterator old = *this;
        -- (*this);
        return old;
    }
};

template <class It0, class It1>
inline CPairIterator<It0, It1> make_pair_iterator(It0 it0, It1 it1)
{
    return CPairIterator<It0, It1>(it0, it1);
}

It is kind of rough around the edges, maybe I'm just bad at overloading the comparisons, but the amount of differences needed to support different implementations of std::sort makes me think the hackish solution might actually be more portable. But the sorting is much nicer:

struct CompareByFirst {
    bool operator ()(std::pair<size_t, char> a, std::pair<size_t, char> b) const
    {
        return a.first < b.first;
    }
};

std::vector<char> vv; // filled by values
std::vector<size_t> kv; // filled by keys

std::sort(make_pair_iterator(kv.begin(), vv.begin()),
    make_pair_iterator(kv.end(), vv.end()), CompareByFirst());
// nice

And of course it gives the correct result.

the swine
  • 10,713
  • 7
  • 58
  • 100
2

Inspired by a comment by Mark Ransom, this is a horrible hack, and an example of how not to do it. I only wrote it for amusement and because I was wondering how complicated would it get. This is not an answer to my question, I will not use this. I just wanted to share a bizarre idea. Please, do not downvote.

Actually, ignoring multithreading, I believe this could be done:

template <class KeyType, class ValueVectorType>
struct MyKeyWrapper { // all is public to save getters
    KeyType k;
    bool operator <(const MyKeyWrapper &other) const { return k < other.k; }
};

template <class KeyType, class ValueVectorType>
struct ValueVectorSingleton { // all is public to save getters, but kv and vv should be only accessible by getters
    static std::vector<MyKeyWrapper<KeyType, ValueVectorType> > *kv;
    static ValueVectorType *vv;

    static void StartSort(std::vector<MyKeyWrapper<KeyType, ValueVectorType> > &_kv, ValueVectorType &_vv)
    {
        assert(!kv && !vv); // can't sort two at once (if multithreading)
        assert(_kv.size() == _vv.size());
        kv = &_kv, vv = &_vv; // not an attempt of an atomic operation
    }

    static void EndSort()
    {
        kv = 0, vv = 0; // not an attempt of an atomic operation
    }
};

template <class KeyType, class ValueVectorType>
std::vector<MyKeyWrapper<KeyType, ValueVectorType> >
    *ValueVectorSingleton<KeyType, ValueVectorType>::kv = 0;
template <class KeyType, class ValueVectorType>
ValueVectorType *ValueVectorSingleton<KeyType, ValueVectorType>::vv = 0;

namespace std {

template <class KeyType, class ValueVectorType>
void swap(MyKeyWrapper<KeyType, ValueVectorType> &a,
    MyKeyWrapper<KeyType, ValueVectorType> &b)
{
    assert((ValueVectorSingleton<KeyType, ValueVectorType>::vv &&
        ValueVectorSingleton<KeyType, ValueVectorType>::kv)); // if this triggers, someone forgot to call StartSort()
    ValueVectorType &vv = *ValueVectorSingleton<KeyType, ValueVectorType>::vv;
    std::vector<MyKeyWrapper<KeyType, ValueVectorType> > &kv =
        *ValueVectorSingleton<KeyType, ValueVectorType>::kv;
    size_t ai = &kv.front() - &a, bi = &kv.front() - &b; // get indices in key vector
    std::swap(a, b); // swap keys
    std::swap(vv[ai], vv[bi]); // and any associated values
}

} // ~std

And sorting as:

std::vector<char> vv; // filled by values
std::vector<MyKeyWrapper<size_t, std::vector<char> > > kv; // filled by keys, casted to MyKeyWrapper

ValueVectorSingleton<size_t, std::vector<char> >::StartSort(kv, vv);
std::sort(kv.begin(), kv.end());
ValueVectorSingleton<size_t, std::vector<char> >::EndSort();
// trick std::sort into using the custom std::swap which also swaps the other vectors

This is obviously very appalling, trivial to abuse in horrible ways, but arguably much shorter than the pair of iterators and probably similar in performance. And it actually works.

Note that swap() could be implemented inside ValueVectorSingleton and the one injected in the std namespace would just call it. That would avoid having to make vv and kv public. Also, the addresses of a and b could further be checked to make sure they are inside kv and not some other vector. Also, this is limited to sorting by values of only one vector (can't sort by corresponding values in both vectors at the same time). And the template parameters could be simply KeyType and ValueType, this was written in a hurry.

the swine
  • 10,713
  • 7
  • 58
  • 100
  • 1
    Thanks. I guess I'm happy that you're scared (as in horror movie makers probably are happy by pleasing their audiences). – the swine Nov 20 '14 at 23:33
2

Here is a solution I once used to sort an array together with an array of indices (--maybe it is from somewhere over here?):

template <class iterator>
class IndexComparison
{
public:
    IndexComparison (iterator const& _begin, iterator const& _end) :
      begin (_begin),
      end (_end)
    {}

    bool operator()(size_t a, size_t b) const
    {
        return *std::next(begin,a) < *std::next(begin,b);
    }

private:
    const iterator begin;
    const iterator end;
};

Usage:

std::vector<int> values{5,2,5,1,9};
std::vector<size_t> indices(values.size());
std::iota(indices.begin(),indices.end(),0);

std::sort(indices.begin(),indices.end()
        , IndexComparison<decltype(values.cbegin())>(values.cbegin(),values.cend()));

Afterwards, the integers in vector indices are permuted such that they correspond to increasing values in the vector values. It is easy to extend this from less-comparison to general comparison functions.

Next, in order to sort also the values, you can do another

std::sort(values.begin(),values.end());

using the same comparison function. This is the solution for the lazy ones. Of course, you can alternatively also use the sorted indices according to

auto temp=values;
for(size_t i=0;i<indices.size();++i)
{
     values[i]=temp[indices[i]];
}

DEMO


EDIT: I just realized that the above sorts into the opposite direction than the one you were asking for.

davidhigh
  • 14,652
  • 2
  • 44
  • 75
  • Nice. Although, note that your demo gives different results from the example above, I suppose you're sorting by the other vector (doesn't really matter, it proves that it works). – the swine Nov 20 '14 at 23:42
  • No need to change it, the comparison operator can always be changed. It proves the concept well enough. A similar approach was also mentioned in the comments. The problem is that this will require a temp array to reshuffle the second vector. Memory is short in my case, it is actually for an out-of-core algorithm and I would like to be able to process as much data as fits in the RAM at once. – the swine Nov 20 '14 at 23:44
  • No need for the copy: just use the found indices on-the-fly , i.e. use `values[indices[i]]`. And further you have the trivial solution, another sort. Note that chapter 8.4. of [Numerical Recipes](http://apps.nrbook.com/c/index.html) contains some information on ranking and indexing, and the third edition also contains a C++ class ready to be used. – davidhigh Nov 21 '14 at 00:07
  • Going to sleep, it is 1AM in my piece of Europe. Since I dont want to accept either of my answers, I'll accept yours if there is nothing better by tomorrow. Appreciate the effort. – the swine Nov 21 '14 at 00:07
  • After the sorting, I need to send the data back to HW, for that it needs to be in the final order. There is an algorithm to convert a permutation into one that can be performed in-place, but that is additional O(n) or more, don't remember exactly. – the swine Nov 21 '14 at 00:09
  • I also thought of this, but didn't find something in NR ressource. And don't accept my answer if it doesn't answer your question. Good night! – davidhigh Nov 21 '14 at 00:10
  • If you don't mind destroying the permutation in the process (which we don't in this case), this can be done similarly to http://stackoverflow.com/questions/7797540/counting-the-swaps-required-to-convert-one-permutation-into-another/22916192#22916192. – the swine Nov 21 '14 at 08:21