Here is a comparison I made. np.argsort
was timed on a float32 ndarray consists of 1,000,000 elements.
In [1]: import numpy as np
In [2]: a = np.random.randn(1000000)
In [3]: a = a.astype(np.float32)
In [4]: %timeit np.argsort(a)
86.1 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
And here is a C++ program do the same procedure but on vectors referring to this answer.
#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
std::vector<float> numbers;
for (int i = 0; i != 1000000; ++i) {
numbers.push_back((float)rand() / (RAND_MAX));
}
double e1 = (double)cv::getTickCount();
std::vector<size_t> idx(numbers.size());
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
{ return numbers[a] < numbers[b];});
double e2 = (double)cv::getTickCount();
std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
return 0;
}
It prints Finished in 525.908 milliseconds.
and it is far slower than the numpy version. So could anyone explain what makes np.argsort
so fast? Thanks.
Edit1: np.__version__
returns 1.15.0
which runs on Python 3.6.6 |Anaconda custom (64-bit)
and g++ --version
prints 8.2.0. Operating system is Manjaro Linux.
Edit2: I treid to compile with This was deleted because I mistakenly run the test with my laptop's DC adapter unplugged, which would cause it slow down. In a fair competition, C-array and vector version perform equally (take about 100ms).-O2
and -O3
flags in g++
and I got result within 216.515 miliseconds and 205.017 miliseconds. That is an improve but still slower than numpy version. (Referring to this question)
Edit3: Another approach would be to replace vector with C like array: float numbers[1000000];
. After that the running time is about 100ms(+/-5ms). Full code here:
#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
//std::vector<float> numbers;
float numbers[1000000];
for (int i = 0; i != 1000000; ++i) {
numbers[i] = ((float)rand() / (RAND_MAX));
}
double e1 = (double)cv::getTickCount();
std::vector<size_t> idx(1000000);
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
{ return numbers[a] < numbers[b];});
double e2 = (double)cv::getTickCount();
std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
return 0;
}