4

For supercomputing simulation purpose, I have a structure that contains two big (billions of elements) std::vector: one std::vector of "keys" (64 bits integers) and one std::vector of "values". I cannot use a std::map because in the simulations I consider, vectors are far more optimal than std::map. Moreover, I cannot use a vector of pairs because of some optimization and cache efficiency provided by separate vectors. Moreover I cannot use any extra memory.

So, considering these constaints, what is the most optimized way to sort the two vectors by increasing values of the keys ? (template metaprogramming and crazy compile-time tricks are welcome)

Vincent
  • 57,703
  • 61
  • 205
  • 388
  • Can you provide some more details: How are those vectors populated? How often do you need to sort them? By 'cannot use any extra memory' does this mean NO extra memory whatsoever and that everything has to happen inplace, or that it is very limited? – divinas Jan 21 '14 at 07:19
  • possible duplicate of [Sorting zipped (locked) containers in C++ using boost or the STL](http://stackoverflow.com/questions/13840998/sorting-zipped-locked-containers-in-c-using-boost-or-the-stl) – TemplateRex Jan 23 '14 at 09:23
  • 3
    Friendly advice: if you stopped saying everything you do is for supercomputing it might be easier to get useful answers. Not only is it hard to take that statement seriously, but it's also hard for anyone who hasn't worked with supercomputers to know what the proper answer should be. I've never used a supercomputer myself, but my understanding is that in order to get good performance out of one, you have to code for *that particular architecture*, which doesn't seem to be what you're doing. I may be wrong, not sure, but in any case you should probably avoid dropping that sentence in. – user541686 Jan 23 '14 at 10:46
  • 1
    Another piece of friendly advice: start doing more research before asking questions. If you did you would have found [this quasi-solution](http://www.stanford.edu/~dgleich/notebook/2006/03/sorting_two_arrays_simultaneou.html) from the duplicate Q link. – TemplateRex Jan 23 '14 at 22:42

3 Answers3

3

Two ideas off the top of my head:

  • Take a quicksort implementation and apply it to the "key" vector; but modify the code so that every time it does a swap on the key vector, it also performs the same swap on the value vector.

  • Or, perhaps more in keeping with the spirit of C++, write a custom "wrapper" iterator which iterates over both vectors at once (returning a std::pair when dereferenced). Perhaps Boost has one? You could then combine this with std::sort and a custom comparison function which considers only the "key".

EDIT:

I've used the first suggestion here for a similar problem back in a past life as a C programmer. It's far from ideal for obvious reasons, but it's possibly the quickest way to get something going.

I haven't tried a wrapper iterator like this with std::sort, but TemplateRex in the comments says it won't work, and I'm happy to defer to him on that one.

Tristan Brindle
  • 16,281
  • 4
  • 39
  • 82
  • 3
    +1 for the second suggestion (wrapper iterator). That's the proper C++ way. – Angew is no longer proud of SO Jan 21 '14 at 07:28
  • 1
    There is a `zip_iterator` in [Boost](http://www.boost.org/doc/libs/1_46_0/libs/iterator/doc/zip_iterator.html) however it is unclear whether it could work in this situation (whether it is a RandomAccessIterator). – Matthieu M. Jan 21 '14 at 08:58
  • @MatthieuM. From the documentation: "The iterator_category member of zip_iterator is convertible to the minimum of the traversal categories of the iterator types in the IteratorTuple argument. For example, if the zip_iterator holds only vector iterators, then iterator_category is convertible to boost::random_access_traversal_tag." –  Jan 21 '14 at 09:14
  • 1
    @MatthieuM. zip_iterators are not suited for sorting: http://stackoverflow.com/a/9343991/819272 – TemplateRex Jan 22 '14 at 12:55
  • Sorry, I have to downvote this in its current form. To begin with, the modify the quicksort source is very brittle. Furthermore, without a live example, I won't accept a claim that it's easy to write a wrapper iterator that can be used with `std::sort`. E.g. if `*WrapIt` yields a `pair` of references, this is not guaranteed to work since `std::sort` is not mandated to use `swap` or `iter_swap`. The Boost Mailinglists contains long discussion about this, and AFAIK, there is no easy solution. – TemplateRex Jan 22 '14 at 13:13
  • @TemplateRex: I have tried to manipulate iterators a number of times, and any kind of transformation is heavily annoying when it comes to writing :/ – Matthieu M. Jan 22 '14 at 13:55
  • @TemplateRex That's fair enough. The two suggestions above were off-the-top-of-my-head ideas of what I might try in the OP's position -- no warranty about whether they'd actually work (or be easy/possible to implement) was intended. I'll modify the answer to make this clear. – Tristan Brindle Jan 23 '14 at 04:50
0

I think problem may be splitted into 2 independent parts:

  1. How to make effective iterator for virtual map
  2. Which sorting alorithm to use

Iterator

Implementing iterator the main problem how to return pair of key/value not creating unnecessary copies. We can achieve it by using different types for value_type & reference. My implementation is here.

template <typename _Keys, typename _Values>
class virtual_map
{
public:
    typedef typename _Keys::value_type key_type;
    typedef typename _Values::value_type mapped_type;
    typedef std::pair<key_type, mapped_type> value_type;
    typedef std::pair<key_type&, mapped_type&> proxy;
    typedef std::pair<const key_type&, const mapped_type&> const_proxy;

    class iterator : 
        public boost::iterator_facade < iterator, value_type, boost::random_access_traversal_tag, proxy >
    {
        friend class boost::iterator_core_access;

    public:
        iterator(virtual_map *map_, size_t offset_) :
            map(map_), 
            offset(offset_)
        {}

        iterator(const iterator &other_) 
        {
            this->map = other_.map;
            this->offset = other_.offset;
        }

    private:
        bool equal(const iterator &other) const
        {
            assert(this->map == other.map);
            return this->offset == other.offset;
        }

        void increment() { ++offset; }
        void decrement() { --offset; }

        void advance(difference_type n) { offset += n; }

        reference dereference() const { return reference(map->keys[offset], map->values[offset]); }

        difference_type distance_to(const iterator &other_) const { return other_.offset - this->offset; }

    private:
        size_t offset;
        virtual_map *map;
    };

public:
    virtual_map(_Keys &keys_, _Values &values_) :
        keys(keys_), 
        values(values_) 
    {
        if(keys_.size() != values_.size())
            throw std::runtime_error("different size");
    }

public:
    iterator begin() { return iterator(this, 0); }
    iterator end() { return iterator(this, keys.size()); }

protected:
    _Keys &keys;
    _Values &values;
};

usage sample:

int main(int argc, char* const argv[]) 
{
    std::vector<int> keys_ = { 17, 2, 13, 4, 51, 78, 49, 37, 1 };
    std::vector<std::string> values_ = { "17", "2", "13", "4", "51", "78", "49", "37", "1" };

    typedef virtual_map<std::vector<int>, std::vector<std::string>> map;

    map map_(keys_, values_);

    std::sort(std::begin(map_), std::end(map_), [](map::const_proxy left_, map::const_proxy right_)
    {
        return left_.first < right_.first;
    });

    return 0;
}

Sorting algorithm

Its very hard to reason which method better without additional details. What memory restriction do you have? Is it possible to use concurrency?

sliser
  • 1,645
  • 11
  • 15
  • You should have asked clarifications before, instead of providing an incomplete answer, sir. – Shoe Jan 23 '14 at 10:29
0

There are some issues:

  • Iterating both sequences together requires a pair representing references to the sequence elements - that pair, itself, is no reference. Hence, algorithms working on references will not work.
  • Performance will degenerate (the sequences are loosely coupled) -

An implementation using a pair of references and std::sort:

// Copyright (c) 2014 Dieter Lucking. Distributed under the Boost
// software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#include <algorithm>
#include <chrono>
#include <memory>
#include <iostream>

// None
// ============================================================================

/// A void type
struct None {

    None()
    {}

    /// Explicit conversion to None.
    template <typename T>
    explicit None(const T&)
    {}

    template <typename T>
    None& operator = (const T&) {
        return *this;
    }

    /// Never null.
    None* operator & () const;
};

extern None& none();
inline None* None::operator & () const { return &none(); }

None& none() {
    static None result;
    return result;
}


// IteratorAdaptorTraits
// ============================================================================

namespace Detail {

    // IteratorAdaptorTraits
    // =====================

    template <typename Iterator, typename ReturnType, bool IsReference>
    struct IteratorAdaptorTraits;

    // No reference
    // ============

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, false>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef ReturnType value_type;
        typedef None reference;
        typedef None pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &none();
        }
    };

    // Reference
    // =========

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, true>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef typename std::remove_reference<ReturnType>::type value_type;
        typedef ReturnType reference;
        typedef value_type* pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &accessor.value(iterator);
        }
    };
} // namespace Detail


// RandomAccessIteratorAdaptor
// ============================================================================

/// An adaptor around a random access iterator.
/// \ATTENTION The adaptor will not fulfill the standard iterator requierments,
///            if the accessor does not support references: In that case, the 
///            reference and pointer type are None.
template <typename Iterator, typename Accessor>
class RandomAccessIteratorAdaptor
{
    // Types
    // =====

    private:
    static_assert(
        ! std::is_base_of<None, Accessor>::value,
        "None as accessor.");

    static_assert(
        ! std::is_base_of<None, typename Accessor::return_type>::value,
        "None as return type.");

    typedef typename Detail::IteratorAdaptorTraits<
        Iterator,
        typename Accessor::return_type,
        std::is_reference<typename Accessor::return_type>::value
    > Traits;

    public:
    typedef typename Traits::iterator_type iterator_type;
    typedef Accessor accessor_type;
    typedef typename std::random_access_iterator_tag iterator_category;
    typedef typename std::ptrdiff_t difference_type;
    typedef typename Traits::return_type return_type;
    typedef typename Traits::value_type value_type;
    typedef typename Traits::reference reference;
    typedef typename Traits::pointer pointer;

    typedef typename accessor_type::base_type accessor_base_type;
    typedef RandomAccessIteratorAdaptor<iterator_type, accessor_base_type> base_type;

    // Tag
    // ===

    public:
    struct RandomAccessIteratorAdaptorTag {};

    // Construction
    // ============

    public:
    explicit RandomAccessIteratorAdaptor(
        iterator_type iterator, const accessor_type& accessor = accessor_type())
    :   m_iterator(iterator), m_accessor(accessor)
    {}

    template <typename IteratorType, typename AccessorType>
    explicit RandomAccessIteratorAdaptor(const RandomAccessIteratorAdaptor<
        IteratorType, AccessorType>& other)
    :   m_iterator(other.iterator()), m_accessor(other.accessor())
    {}

    // Element Access
    // ==============

    public:
    /// The underlaying accessor.
    const accessor_type& accessor() const { return m_accessor; }
    /// The underlaying iterator.
    const iterator_type& iterator() const { return m_iterator; }
    /// The underlaying iterator.
    iterator_type& iterator() { return m_iterator; }
    /// The underlaying iterator.
    operator iterator_type () const { return m_iterator; }

    /// The base adaptor.
    base_type base() const {
        return base_type(m_iterator, m_accessor.base());
    }

    // Iterator
    // ========

    public:
    return_type operator * () const {
        return Traits::iterator_value(m_accessor, m_iterator);
    }
    pointer operator -> () const {
        return Traits::iterator_pointer(m_accessor, m_iterator);
    }

    RandomAccessIteratorAdaptor increment() const {
        return ++RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor increment_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator += n;
        return tmp;
    }

    RandomAccessIteratorAdaptor decrement() const {
        return --RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor decrement_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator -= n;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator ++ () {
        ++m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator ++ (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        ++m_iterator;
        return tmp;

    }
    RandomAccessIteratorAdaptor& operator += (difference_type n) {
        m_iterator += n;
        return *this;
    }

    RandomAccessIteratorAdaptor& operator -- () {
        --m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator -- (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        --m_iterator;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator -= (difference_type n) {
        m_iterator -= n;
        return *this;
    }


    bool equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator == other.m_iterator;
    }
    bool less(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator < other.m_iterator;
    }
    bool less_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator <= other.m_iterator;
    }
    bool greater(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator > other.m_iterator;
    }
    bool greater_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator >= other.m_iterator;
    }

    private:
    iterator_type m_iterator;
    accessor_type m_accessor;
};


template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator + (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.increment_n(n);
}

template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.decrement_n(n);
}

template <typename Iterator, typename Accessor>
inline typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type
operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.iterator() - b.iterator();
}

template <typename Iterator, typename Accessor>
inline bool operator == (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator != (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return ! a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less_equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater_equal(b);
}


// ElementPair
// ============================================================================

/// A pair of references which can mutate to a pair of values.
/// \NOTE If the key is one or two the pair is less comparable
///       regarding the first or second element. 
template <typename First, typename Second, unsigned Key = 0>
class ElementPair
{
    // Types
    // =====

    public:
    typedef First first_type;
    typedef Second second_type;

    // Construction
    // ============

    public:
    /// Reference
    /// \POSTCONDITION reference() returns true
    ElementPair(first_type& first, second_type& second)
    :   m_first(&first), m_second(&second)
    {}

    /// Copy construction
    /// \POSTCONDITION reference() returns false
    ElementPair(const ElementPair& other)
    :   m_first(new(m_first_storage) first_type(*other.m_first)),
        m_second(new(&m_second_storage) second_type(*other.m_second))
    {}

    /// Move construction
    /// \POSTCONDITION reference() returns false
    ElementPair(ElementPair&& other)
    :   m_first(new(m_first_storage) first_type(std::move(*other.m_first))),
        m_second(new(m_second_storage) second_type(std::move(*other.m_second)))
    {}

    ~ElementPair() {
        if( ! reference()) {
            reinterpret_cast<first_type*>(m_first_storage)->~first_type();
            reinterpret_cast<second_type*>(m_second_storage)->~second_type();
        }
    }

    // Assignment
    // ==========

    public:
    /// Swap content.
    void swap(ElementPair& other) {
        std::swap(*m_first, *other.m_first);
        std::swap(*m_second, *other.m_second);
    }

    /// Assign content.
    ElementPair& operator = (const ElementPair& other) {
        if(&other != this) {
            *m_first = *other.m_first;
            *m_second = *other.m_second;
        }
        return *this;
    }

    /// Assign content.
    ElementPair& operator = (ElementPair&& other) {
        if(&other != this) {
            *m_first = std::move(*other.m_first);
            *m_second = std::move(*other.m_second);
        }
        return *this;
    }

    // Element Access
    // ==============

    public:
    /// True if the pair holds references to external elements.
    bool reference() {
        return (m_first != reinterpret_cast<first_type*>(m_first_storage));
    }
    const first_type& first() const { return *m_first; }
    first_type& first() { return *m_first; }

    const second_type& second() const { return *m_second; }
    second_type& second() { return *m_second; }

    private:
    first_type* m_first;
    typename std::aligned_storage<
        sizeof(first_type),
        std::alignment_of<first_type>::value>::type
        m_first_storage[1];

    second_type* m_second;
    typename std::aligned_storage<
        sizeof(second_type),
        std::alignment_of<second_type>::value>::type
        m_second_storage[1];
};

// Compare
// =======

template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 1>& a,
    const ElementPair<First, Second, 1>& b)
{
    return (a.first() < b.first());
}


template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 2>& a,
    const ElementPair<First, Second, 2>& b)
{
    return (a.second() < b.second());
}

// Swap
// ====

namespace std {
    template <typename First, typename Second, unsigned Key>
    inline void swap(
        ElementPair<First, Second, Key>& a,
        ElementPair<First, Second, Key>& b)
    {
        a.swap(b);
    }
}

// SequencePairAccessor
// ============================================================================

template <typename FirstSequence, typename SecondSequence, unsigned Keys = 0>
class SequencePairAccessor
{
    // Types
    // =====

    public:
    typedef FirstSequence first_sequence_type;
    typedef SecondSequence second_sequence_type;
    typedef typename first_sequence_type::size_type size_type;
    typedef typename first_sequence_type::value_type first_type;
    typedef typename second_sequence_type::value_type second_type;
    typedef typename first_sequence_type::iterator iterator;

    typedef None base_type;
    typedef ElementPair<first_type, second_type, Keys> return_type;

    // Construction
    // ============

    public:
    SequencePairAccessor(first_sequence_type& first, second_sequence_type& second)
    :   m_first_sequence(&first), m_second_sequence(&second)
    {}

    // Element Access
    // ==============

    public:
    base_type base() const { return base_type();    }
    return_type value(iterator pos) const {
        return return_type(*pos, (*m_second_sequence)[pos - m_first_sequence->begin()]);
    }

    // Data
    // ====

    private:
    first_sequence_type* m_first_sequence;
    second_sequence_type* m_second_sequence;
};

This test shows a degenaration of performance (on my system) by a factor of 1.5 for const char* and a factor of 3.4 for a std::string (compared to a single vector holding std::pair(s)).

// Test
// ============================================================================

#define SAMPLE_SIZE 1e1
#define VALUE_TYPE const char*

int main() {
    const unsigned samples = SAMPLE_SIZE;

    typedef int key_type;
    typedef VALUE_TYPE value_type;
    typedef std::vector<key_type> key_sequence_type;
    typedef std::vector<value_type> value_sequence_type;

    typedef SequencePairAccessor<key_sequence_type, value_sequence_type, 1> accessor_type;
    typedef RandomAccessIteratorAdaptor<
        key_sequence_type::iterator,
        accessor_type>
        iterator_adaptor_type;

    key_sequence_type keys;
    value_sequence_type values;
    keys.reserve(samples);
    values.reserve(samples);
    const char* words[] = { "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
    for(unsigned i = 0; i < samples; ++i) {
        key_type k = i % 10;
        keys.push_back(k);
        values.push_back(words[k]);
    }

    accessor_type accessor(keys, values);
    std::random_shuffle(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );

    if(samples <= 10) {
        std::cout << "\nRandom:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    typedef std::pair<key_type, value_type> pair_type;
    std::vector<pair_type> ref;
    for(const auto& k: keys) {
        ref.push_back(pair_type(k, words[k]));
    }

    struct Less {
        bool operator () (const pair_type& a, const pair_type& b) const {
            return a.first < b.first;
        }
    };
    auto ref_start = std::chrono::system_clock::now();
    std::sort(ref.begin(), ref.end(), Less());
    auto ref_end = std::chrono::system_clock::now();
    auto ref_elapsed = double((ref_end - ref_start).count())
                     / std::chrono::system_clock::period::den;

    auto start = std::chrono::system_clock::now();
    std::sort(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );
    auto end = std::chrono::system_clock::now();
    auto elapsed = double((end - start).count())
                 / std::chrono::system_clock::period::den;;

    if(samples <= 10) {
        std::cout << "\nSorted:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    std::cout << "\nDuration sorting " << double(samples) << " samples:\n"
              <<   "========\n"
              << " One Vector: " << ref_elapsed << '\n'
              << "Two Vectors: " << elapsed << '\n'
              << "     Factor: " << elapsed/ref_elapsed << '\n'
              << '\n';
}

(Please adjust SAMPLE_SIZE and VALUE_TYPE)

My conclusion is a sorted view into a sequence of unsorted data might be more aprropiate (but that violates the requirement of the question).