3

Can anyone suggest a fast way to get the rank of each element in a vector. I don't need to sort the vector, but only get the index of each element if the vector was sorted

for ex: {40, 20, 10, 30} should give {3, 1, 0, 2}

Will i be able to get a speedup because i don't actually have to sort the data in-place?

Vikash Balasubramanian
  • 2,921
  • 3
  • 33
  • 74

5 Answers5

6

The exact same proof of the lower bound on sorting applies here. Sans additional information (key distribution, etc.), it is n log(n) at a lower bound, and you might as well sort. Formally, anything lower would allow you to compress permutations below the Kolmogorov complexity.


That being said, there is the question of how to sort the indices. See here.

Community
  • 1
  • 1
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • This is a nice answer, but i did know that you can't beat the nlog(n) lower bound, what i meant by speed-up was within that but in the intricate details.. like by a constant factor or something – Vikash Balasubramanian Jun 06 '15 at 16:36
  • @VikashB See update. I have to say that, in this respect, your question is something of a duplicate, no? – Ami Tavory Jun 06 '15 at 16:42
  • Not really, since the constant factors are going to be very different and you can't exactly just use numpy in C++. – Puppy Jun 06 '15 at 16:45
  • Why can't you use numpy? ``numpy.argsort(foo)`` should work for your example. My money would be that this gives the best constants too. – Ami Tavory Jun 06 '15 at 16:48
  • @AmiTavory: Because numpy is not C++. It's Python. `g++` will not acept `numpy.argsort(foo)`. – Puppy Jun 06 '15 at 16:54
  • @Puppy LOL, you're absolutely right. That's what happens from over-concurrency at SO. – Ami Tavory Jun 06 '15 at 16:59
4

You may use the following:

template <typename T>
std::vector<std::size_t> compute_order(const std::vector<T>& v)
{
    std::vector<std::size_t> indices(v.size());
    std::iota(indices.begin(), indices.end(), 0u);
    std::sort(indices.begin(), indices.end(), [&](int lhs, int rhs) {
        return v[lhs] < v[rhs];
    });
    std::vector<std::size_t> res(v.size());
    for (std::size_t i = 0; i != indices.size(); ++i) {
        res[indices[i]] = i;
    }
    return res;
}

Live example

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • res is a local vector so it is gone when the function returns. To fix this, change res to be a pointer to a vector, then res = new vector ... , and return the pointer res. The calling code will need to delete res when done. To sort the original array or vector A into array or vector B, for i = 0 to size -1, B[(*res)[i]] = A[i] . – rcgldr Jun 06 '15 at 22:05
  • 2
    @rcgldr: `res` is not gone, it's being returned by the function. No need for a pointer nonsense. – DanielKO Jun 07 '15 at 00:25
  • Can't you just return `indices`? As far as I can tell, the for loop just copies the vector. – Jan Hošek Sep 03 '20 at 13:04
2

I can think of two ways: (but I don't think they will be faster)

  1. put <value, index> pair in a map
  2. put index in another vector, sort that vector with proper comparison function
Xiaotian Pei
  • 3,210
  • 21
  • 40
2

A first approach is to do a copy of array and sort it. Afterward you traverse the original array and on each item you perform a binary search for determing the rank. During this traversing you produce the desired sequence. With this aproach you takes O(n) for the copy, plus O(n lg n) for the sort and finally O(n lg n) for producing the sequence of ranks.

Another way is to insert all the items in a binary search tree (a balanced one such as an avl or red-black). This takes O(n lg n). Your binary tree must support "the rank extension"; that is, the sizes of each subtree must be stored in the nodes. These trees can export the operation position(key) which returns the rank of key.

Afterward, you traverse your array and for each entry you call to position(array[i]). During this process you are producing the sequence of rank which is parallel to your array. This takes O(n lg n).

I think that the advantage of this approach respect to copy into an array of pairs and then sort it or simply to sort a copy of array and then to determine the rank by searching with binary search in the copied array, is that you avoid the extra copy from the sorted array of pairs to the sequence of ranks.

Added and corrected:

According to @xiaotian-peiI answer, I think it would be even better simply to insert pairs (key, index) in a deterministically balanced binary search tree (avl or red-black) sorted by keys; that takes O(n lg n). Then you traverse the binary tree inorder extracting the indexes, what takes O(n). Finally you free the tree, what takes O(n). So the total would be O(n lg n) + O(n) + O(n)

Maybe still more efficient according to scale and not the same complexity: to use a heap of pairs (key,index) and successively extract from it for building the sequence of ranks.

And very probably faster and sure less space consuming: the algorithm published by Jarod42, what I think is O(n) + O(n lg n) + O(n) too, but that would profit more the cache

lrleon
  • 2,610
  • 3
  • 25
  • 38
  • Perfect, exactly what i wanted, the second approach was great, i hadn't heard about the rank extension before – Vikash Balasubramanian Jun 06 '15 at 17:38
  • 1
    The answer provided by Jarod42 below is more efficient. You sort indices according to the array, then make a single pass (O(n)) using the sorted indices to create an array (vector res in Jarod42's answer), of the indices wanted in the original post. – rcgldr Jun 06 '15 at 22:59
  • @rcgldr: yes! it is a very beautiful solution. I edited the answer and explained a third approach, what I think would be still more efficient. As I understand the Jarod42 proposal, this would take O(n) for initializing indexes (iota call), then O(n lg n) for sorting it and finally O(n) for extracting the ranks; this would take O(n) + O(n lg n) + O(n) versus O(n lg n) + O(n) – lrleon Jun 06 '15 at 23:55
  • @Irleon - I don't understand what method takes O(n lg n) + O(n). For the binary tree approach, you have to insert elements (create pairs if using pairs) into the tree, then traverse the tree. Tree operations are going to take more time than the indexing used by the sort and the conversion of "source" indices to "destination" indices. – rcgldr Jun 07 '15 at 00:16
  • I'm wrong, because I forgot to consider the tree freeing. So, there are n pair insertions in a binary search tree; what takes O(n lg n). Then you traverse the binary tree inorder, and during this traversal you copy the indexes to the sequence of ranks. The traversal can be done in O(n). Finally you must free the tree what takes O(n). So, that is O(n) + O(n lg n) + O(n), what is the same complexity than Jarod42 approach. But now I think the last would be faster because it is sure consumes less space and maybe profit more the hardware cache that the binary tree, Thanks. I going to correct – lrleon Jun 07 '15 at 00:40
  • Anyway `O(n log n) + k O(n)` is still `O(n log n)`. – Jarod42 Jun 07 '15 at 08:26
0

For your case of numbers, sorting the array itself isn't harder than sorting the indices -- you'd be constructing a index set and order that by the original values.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94