12

I want to get the indices that would sort a Vec in Rust. Effectively, I want argsort() from numpy.

For example:

let v = vec![1, 7, 4, 2];
let i = argsort(&v);
assert_eq!(i, &[0, 3, 2, 1]);
kmdreko
  • 42,554
  • 6
  • 57
  • 106
Michael Hall
  • 2,834
  • 1
  • 22
  • 40

1 Answers1

25

Not sure if there's something pre-made for this, but its simple enough to implement yourself with .sort_by_key():

pub fn argsort<T: Ord>(data: &[T]) -> Vec<usize> {
    let mut indices = (0..data.len()).collect::<Vec<_>>();
    indices.sort_by_key(|&i| &data[i]);
    indices
}

See it working on the playground.

kmdreko
  • 42,554
  • 6
  • 57
  • 106
  • 3
    What about for a vector of floats? – Katya Feb 20 '22 at 18:13
  • 4
    @Katya Floats do not have a total order. You would need `sort_by_key` to return a wrapper type or use [`sort_by()`](https://doc.rust-lang.org/std/primitive.slice.html#method.sort_by) with a function that can deal with comparing `NaN`. See answers [here](https://stackoverflow.com/q/28247990) or [here](https://stackoverflow.com/q/40408293) for ideas. – kmdreko Feb 20 '22 at 18:27
  • 1
    Is there a reason a built in or library packaged function such as `argsort()` for float data types is not available? I am coming over from Python and things seem a bit convoluted sometimes in Rust (I am sure I will learn to appreciate it immensely with more use). – Katya Feb 21 '22 at 11:57
  • Won't would this perform un-necessary bound checks? – Jorge Leitao Jun 18 '22 at 09:42
  • 1
    @JorgeLeitao Perhaps. You can check the compiler output to see if it was able to elide the checks (though I doubt it for this code). But to avoid unnecessary bounds checks you would have to reach for `unsafe` which you shouldn't do in general unless you can justify it with performance. I doubt eliding the checks would have an appreciable difference in this case because those would be cold branches to take and would be dwarfed by the cost of the sorting. But of course, your mileage may vary, measure it for yourself. – kmdreko Jun 18 '22 at 15:16
  • This doesn't work, you'd need to use indices instead of array elements for this. The test case in the playground works by chance, but if you replace it with a different test case like [234, 23, 4, 235, 24] (ranks are [3, 1, 0, 4, 2]), it falls apart. – J Horseman Apr 17 '23 at 00:21
  • 1
    @JHorseman This produces the same result as [`numpy.argsort()`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html), which for your input is `[2, 1, 4, 0, 3]`. The output is not the indices the associated elements would have if sorted, rather the output is the indices you need to access in order to view the original list in sorted order. For `[234, 23, 4, 235, 24]`, the first sorted element is `4`, which is in the 2nd (0-indexed) spot, so the first index in the output is `2`. – kmdreko Apr 17 '23 at 01:30
  • Maybe I shouldnt test someones code when I am sleep deprived. Thanks for the clarification – J Horseman Apr 27 '23 at 23:29