C++ STL: Custom sorting one vector based on contents of another

Question

This is probably best stated as an example. I have two vectors/lists:

People = {Anne, Bob, Charlie, Douglas}
Ages   = {23, 28, 25, 21}

I want to sort the People based on their ages using something like sort(People.begin(), People.end(), CustomComparator), but I don't know how to write the CustomComparator to look at Ages rather than People.

The problem here is that sort will move the elements around in `People` but no in `Ages` so will lose the actual coupling that exists. Look for the various answers that advise grouping together the two attributes to avoid losing the correspondence. — Matthieu M., Nov 12 '09 at 15:45
side note: there is a difference between vector and list. You can use std::sort for vectors but should use the special member function std::list<>::sort for lists as they don't have random access iterators. — sellibitze, Nov 12 '09 at 16:09

Jerry Coffin · Answer 1 · 2020-06-11T05:53:28.830

Obvious Approach

Instead of creating two separate vectors/lists, the usual way to handle this is to create a single vector/list of objects that include both names and ages:

struct person { 
    std::string name;
    int age;
};

To get a sort based on age, pass a comparator that looks at the ages:

std::sort(people.begin(), people.end(), 
          [](auto const &a, auto const &b) { return a.age < b.age; });

In older C++ (pre C++11, so no lambda expressions) you can define the comparison as a member overload of operator< or else as a function-object (an object that overloads operator()) to do the comparison:

struct by_age { 
    bool operator()(person const &a, person const &b) const noexcept { 
        return a.age < b.age;
    }
};

Then your sort would look something like:

std::vector<person> people;
// code to put data into people goes here.

std::sort(people.begin(), people.end(), by_age());

As for choosing between defining operator< for the class, or using a separate comparator object as I show above, it's mostly a question of whether there's a single ordering that's "obvious" for this class.

In my opinion, it's not necessarily obvious that sorting people would always happen by age. If, however, in the context of your program it would be obvious that sorting people would be done by age unless you explicitly specified otherwise, then it would make sense to implement the comparison as person::operator< instead of in a separate comparison class the way I've done it above.

Other Approaches

All that having been said, there are a few cases where it really is impractical or undesirable to combine the data into a struct before sorting.

If this is the case, you have a few options to consider. If a normal sort is impractical because the key you're using is too expensive to swap (or can't be swapped at all, though that's pretty rare), you might be able to use a type where you store the data to be sorted along with just an index into the collection of keys associated with each:

using Person = std::pair<int, std::string>;

std::vector<Person> people = {
    { "Anne", 0},
    { "Bob", 1},
    { "Charlie", 2},
    { "Douglas", 3}
};

std::vector<int> ages = {23, 28, 25, 21};

std::sort(people.begin(), people.end(), 
    [](Person const &a, person const &b) { 
        return Ages[a.second] < Ages[b.second];
    });

You can also pretty easily create a separate index that you sort in the order of the keys, and just use that index to read through the associated values:

std::vector<std::string> people = { "Anne", "Bob", "Charlie", "Douglas" };   
std::vector<int> ages = {23, 28, 25, 21};

std::vector<std::size_t> index (people.size());
std::iota(index.begin(), index.end(), 0);

std::sort(index.begin(), index.end(), [&](size_t a, size_t b) { return ages[a] < ages[b]; });

for (auto i : index) { 
    std::cout << people[i] << "\n";
}

Note, however, that in this case, we haven't really sorted the items themselves at all. We've just sorted the index based on the ages, then used the index to index into the array of data we wanted sorted--but both the ages and names remain in their original order.

Of course, it's theoretically possible that you have such a bizarre situation that none of the above will work at all, and you'll need to re-implement sorting to do what you really want. While I suppose the possibility could exist, I've yet to see it in practice (nor do I even recall seeing a close call where I almost decided that was the right thing to do).

This line should be std::sort(people.begin(), people.end(), by_age); I think there shouldn't be parentheses after "by_age" — jm1234567890, May 22 '10 at 07:12
@jm1234567890: not so. `by_age` is a class, and what we need is an object. The parens mean we're invoking its ctor, so we're passing an object of that class to do the comparison for sorting. If `by_age` were a function, however, you'd be absolutely correct. — Jerry Coffin, May 22 '10 at 07:40

score 11 · Answer 2 · answered Nov 12 '09 at 16:25

11

As others have noted, you should consider grouping People and Ages.

If you can't/don't want to, you could create an "index" to them, and sort that index instead. For example:

// Warning: Not tested
struct CompareAge : std::binary_function<size_t, size_t, bool>
{
    CompareAge(const std::vector<unsigned int>& Ages)
    : m_Ages(Ages)
    {}

    bool operator()(size_t Lhs, size_t Rhs)const
    {
        return m_Ages[Lhs] < m_Ages[Rhs];
    }

    const std::vector<unsigned int>& m_Ages;
};

std::vector<std::string> people = ...;
std::vector<unsigned int> ages = ...;

// Initialize a vector of indices
assert(people.size() == ages.size());
std::vector<size_t> pos(people.size());
for (size_t i = 0; i != pos.size(); ++i){
    pos[i] = i;
}


// Sort the indices
std::sort(pos.begin(), pos.end(), CompareAge(ages));

Now, the name of the nth person is people[pos[n]] and its age is ages[pos[n]]

answered Nov 12 '09 at 16:25

Éric Malenfant

13,938
1
40
42

1

Thanks a bunch. Just a note: some people here are treating "people and ages should be in the same container" as an answer, when it isn't, and there are legitimate reasons to want to be able to do what OP has asked. – pretzlstyle Dec 10 '18 at 20:22
+1. By the way what's the purpose of inheriting from binary_function, is it possible to remove this inheritance, and still use CompareAge? – Kari Aug 25 '19 at 23:38
1

@Kari: binary_function is deprecated since C++11. Prior to that, its purpose was to provide typedefs that were required by some function adaptors. Since no such adaptors are used here, it was not required but it was common practice to define functors like this to make sure that they were compatible with the adaptors. – Éric Malenfant Aug 26 '19 at 14:50

score 3 · Answer 3 · answered Nov 12 '09 at 15:39

3

Generally you wouldn't put data that you want to keep together in different containers. Make a struct/class for Person and overload operator<.

struct Person
{
    std::string name;
    int age;
}

bool operator< (const Person& a, const Person& b);

Or if this is some throw-away thing:

typedef std::pair<int, std::string> Person;
std::vector<Person> persons;
std::sort(persons.begin(), persons.end());

std::pair already implement comparison operators.

answered Nov 12 '09 at 15:39

UncleBens

40,819
6
57
90

But would yhe default operator< for a pair sort on the second member, as the questions ask? I doubt it... – Xavier Nodet Nov 12 '09 at 16:18
@Xavier: he made the `int` the first member of the `std::pair`. – Jerry Coffin Nov 12 '09 at 16:30

score 2 · Answer 4 · answered Nov 12 '09 at 15:52

It doesn't make sense to keep them in two separate data structures: if you reorder People, you no longer have a sensible mapping to Ages.

template<class A, class B, class CA = std::less<A>, class CB = std::less<B> >
struct lessByPairSecond
    : std::binary_function<std::pair<A, B>, std::pair<A, B>, bool>
{
    bool operator()(const std::pair<A, B> &left, const std::pair<A, B> &right) {
        if (CB()(left.second, right.second)) return true;
        if (CB()(right.second, left.second)) return false;
        return CA()(left.first, right.first);
    }
};

std::vector<std::pair<std::string, int> > peopleAndAges;
peopleAndAges.push_back(std::pair<std::string, int>("Anne", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Bob", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Charlie", 23));
peopleAndAges.push_back(std::pair<std::string, int>("Douglas", 23));
std::sort(peopleAndAges.begin(), peopleAndAges.end(),
        lessByPairSecond<std::string, int>());

Sure it can make sense, like if Ages is rarely used you pay for it's existence in increased cache pressure every time you refer to People — mabraham, Dec 16 '19 at 07:34

score 0 · Answer 5 · answered Nov 12 '09 at 15:37

0

I would suggest merging these two lists into a single list of structures. That way you can simply define operator < like dirkgently said.

answered Nov 12 '09 at 15:37

lyricat

1,988
2
12
20

score 0 · Answer 6 · answered Jan 05 '12 at 19:20

Jerry Coffin answer was fully clear and correct.

A just have a related issue that may grant a good discussion to the topic... :)

I had to reorder the columns of a matrix object (lets say TMatrix< T >) based on the sorting of a vector (lets say sequence)... The TMatrix< T > class do not provide reference access to it's rows (thus I can not create a structure to reorder it...) but conveniently provides a method TMatrix< T >::swap(row1, row2)...

So that's the code:

TMatrix<double> matrix;
vector<double> sequence;
// 
// 1st step: gets indexes of the matrix rows changes in order to sort by time
//
// note: sorter vector will have 'sorted vector elements' on 'first' and 
// 'original indexes of vector elements' on 'second'...
//
const int n = int(sequence.size());
std::vector<std::pair<T, int>> sorter(n);
for(int i = 0; i < n; i++) {
    std::pair<T, int> ae;
    ae.first = sequence[i]; 
    ae.second = i;              
    sorter[i] = ae;
}           
std::sort(sorter.begin(), sorter.end());

//
// 2nd step: swap matrix rows based on sorter information
//
for(int i = 0; i < n; i++) {
    // updates the the time vector
    sequence[i] = sorter[i].first;
    // check if the any row should swap
    const int pivot = sorter[i].second;
    if (i != pivot) {
        //
        // store the required swaps on stack
        //
        stack<std::pair<int, int>> swaps;
        int source = pivot;
        int destination = i;
        while(destination != pivot) {
            // store required swaps until final destination 
            // is equals to first source (pivot)
            std::pair<int, int> ae;
            ae.first = source;
            ae.second = destination;
            swaps.push(ae);
            // retrieves the next requiret swap
            source = destination;
            for(int j = 0; j < n; j++) {
                if (sorter[j].second == source) 
                    destination = j;
                    break;
                }
            }
        }                   
        //
        // final step: execute required swaps
        //
        while(!swaps.empty()) {
            // pop the swap entry from the stack
            std::pair<int, int> swap = swaps.top();
            destination = swap.second;                      
            swaps.pop();
            // swap matrix coluns
            matrix.swap(swap.first, destination);
            // updates the sorter
            sorter[destination].second = destination;
        }
        // updates sorter on pivot
        sorter[pivot].second = pivot;
    }
}

I belive that's still O(n log n) since every row that is not in place will swap just one time...

Have fun! :)

C++ STL: Custom sorting one vector based on contents of another

6 Answers6

Obvious Approach

Other Approaches

Linked