4

I have two vectors:

vec1 <- c(0, 1, 2, 3, 4, 5, 6, 7, 9)
vec2 <- c(1, 2, 7, 5, 3, 6, 80, 4, 8)

I would like to set the same order in vec1 as it is in vec2. For example, in vec2 the highest number (position 9) is in position 7, so I would like to put the highest number in vec1 (position 9, number 9) to position 7.

Expected output:

vec1 <- c(0, 1, 6, 4, 2, 5, 9, 3, 7)

I don't have any duplicated values in any vector.

I'm primarily interested in efficient Rcpp solutions but also anything in R is welcome.

MarcW
  • 108
  • 8
  • 3
    There are _two_ Rcpp Gallery posts about precisely this. Did you look there? [One for Rcpp](https://gallery.rcpp.org/articles/subsetting/) and a [second one for RcppArmadillo](https://gallery.rcpp.org/articles/armadillo-subsetting/index.html). – Dirk Eddelbuettel Mar 22 '20 at 18:10
  • Yes, have looked everywhere. Have only seen sort_index variants everywhere.. If you have a link, I am grateful. – MarcW Mar 22 '20 at 18:11
  • 2
    I strongly doubt you "looked everywhere". We very likely have duplicates of that question here too. – Dirk Eddelbuettel Mar 22 '20 at 18:13
  • 1
    If there are duplicates, please do close the question and refer me to them as I cannot find them. For the links unfortunately I'm not able to see the link between this sorting problem and subsetting... – MarcW Mar 22 '20 at 18:16

4 Answers4

10

Another baseR option is match

vec1[match(vec2, sort(vec2))]
# [1] 0 1 6 4 2 5 9 3 7

edit

Including a benchmark with larger sample size

set.seed(42)
n <- 1e6
vec1 <- seq_len(n)
vec2 <- sample(1:1e7, size = n)

benchmarks <- bench::mark(match = vec1[match(vec2, sort(vec2))],
                          rank = vec1[rank(vec2)],
                          frank = vec1[data.table::frank(vec2)],
                          order_order = vec1[order(order(vec2))],
                          rcpp_order_order = foo(vec1, vec2),
                          iterations = 25)
benchmarks[ , 1:3]

Result

# A tibble: 5 x 3
#  expression            min   median
#  <bch:expr>       <bch:tm> <bch:tm>
#1 match             259.8ms    322ms
#2 rank              825.9ms    876ms
#3 frank              88.6ms    134ms
#4 order_order       110.6ms    139ms
#5 rcpp_order_order  793.5ms    893ms
markus
  • 25,843
  • 5
  • 39
  • 58
  • 1
    Nice re: benchmarks. I suspected the `Rcpp` solution might slow down considerably given the need for copies..... (which is why I put that warning in the answer) but thought it would be good for the OP to have one available since they mentioned it specifically in the question. In any case, looks like `data.table::frank()` is the way to go! – duckmayr Mar 29 '20 at 23:20
6

We can adapt the Rcpp version of order() from this answer (to account for the fact that you do not want to check for duplicates and adding a function to order by an order of an ordering) to make the following Rcpp solution:

#include <Rcpp.h>

Rcpp::IntegerVector order(const Rcpp::NumericVector& x) {
    return Rcpp::match(Rcpp::clone(x).sort(), x);
}

Rcpp::IntegerVector order(const Rcpp::IntegerVector& x) {
    return Rcpp::match(Rcpp::clone(x).sort(), x);
}

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector x,
                        const Rcpp::NumericVector y) {
    return x[order(order(y))-1];
}

Then we get the expected results:

library(Rcpp)
sourceCpp("foo.cpp")

vec1 <- c(0, 1, 2, 3, 4, 5, 6, 7, 9)
vec2 <- c(1, 2, 7, 5, 3, 6, 80, 4, 8)

foo(vec1, vec2)
# [1] 0 1 6 4 2 5 9 3 7

with decent performance (comparisons are to the R solutions presented by other answers):

benchmarks <- bench::mark(match = vec1[match(vec2, sort(vec2))],
                          rank = vec1[rank(vec2)],
                          order_order = vec1[order(order(vec2))],
                          rcpp_order_order = foo(vec1, vec2),
                          iterations = 10000)
benchmarks[ , 1:3]

# # A tibble: 4 x 3
#   expression            min   median
#   <bch:expr>       <bch:tm> <bch:tm>
# 1 match              28.4µs  31.72µs
# 2 rank               7.99µs   9.84µs
# 3 order_order       26.27µs  30.61µs
# 4 rcpp_order_order   2.51µs   3.23µs

Note that this solution only works if there are no duplicates. (If you might run into duplicates, adding a check is demonstrated in the linked-to answer). Also note that these benchmarks were just done on this data; I don't know for sure how they change at scale.

duckmayr
  • 16,303
  • 3
  • 35
  • 53
5

We could use rank

vec1[rank(vec2)]
#[1] 0 1 6 4 2 5 9 3 7

Or with order

vec1[order(order(vec2))]
#[1] 0 1 6 4 2 5 9 3 7

Or as @markus suggested an option with frank from data.table

library(data.table)
vec1[frank(vec2)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 3
    Just benchmarked your first option with `data.table::frank()` which seems to outperform all other options so far. Maybe include that option too? – markus Mar 25 '20 at 22:14
0

If I understand you correctly, you want vec1 to follow the same order of vec1. That is, is vec2 is increasing, so should the values of vec1; if vec2 is decreasing, so should vec1 and so on.

sort(vec1)[order(vec2)]
user2332849
  • 1,421
  • 1
  • 9
  • 12