You can also use C++'s partial_sort
through Rcpp with a file with the following content:
include "Rcpp.h"
#include <algorithm>
using namespace Rcpp;
inline bool rev_comp(double const i, double const j){
return i > j;
}
// [[Rcpp::export(rng = false)]]
NumericVector cpp_partial_sort(NumericVector x, unsigned const k) {
if(k >= x.size() or k < 1)
throw std::invalid_argument("Invalid k");
if(k + 1 == x.size())
return x;
NumericVector out = clone(x);
std::partial_sort(&out[0], &out[k + 1], &out[x.size() - 1], rev_comp);
return out;
}
We can now confirm that we get the same and make a benchmark:
# simulate data
set.seed(2)
x <- rnorm(10000)
# they all give the same
rk <- 5
setdiff(cpp_partial_sort(x, rk)[1:rk],
-sort(-x, partial = 1:rk)[1:rk])
#R> numeric(0)
setdiff(cpp_partial_sort(x, rk)[1:rk],
sort(x, decreasing = TRUE)[1:5])
#R> numeric(0)
setdiff(cpp_partial_sort(x, rk)[1:rk],
x[order(x, decreasing = TRUE)][1:rk])
#R> numeric(0)
setdiff(cpp_partial_sort(x, rk)[1:rk],
{ p <- length(x) - 0:(rk - 1); sort(x, partial = p)[p] })
#R> numeric(0)
# benchmark
microbenchmark::microbenchmark(
cpp = cpp_partial_sort(x, rk)[1:rk],
snoram = -sort(-x, partial = 1:5)[1:5],
OP = sort(x, decreasing = TRUE)[1:5],
sotos_check = x[order(x, decreasing = TRUE)][1:5],
jogo = {p <- length(x) - 0:4; sort(x, partial = p)[p]}, times = 1000)
#R> Unit: microseconds
#R> expr min lq mean median uq max neval
#R> cpp 23.7 26.1 32.2 27 28 4384 1000
#R> snoram 174.3 185.2 208.3 188 194 3968 1000
#R> OP 528.6 558.4 595.9 562 574 4630 1000
#R> sotos_check 474.9 504.4 550.7 507 519 4446 1000
#R> jogo 172.1 182.1 194.7 186 190 3744 1000
There is the compilation time but this can be offset if cpp_partial_sort
is called many times. The solution can possibly made more generic with a template version like I show here.