I have a set of t thresholds that separate my data vector y into t-1 categories.
y <- runif(100) # data vector
t <- c(0, 0.5, 1) # threshold vector
In this example, category 1 corresponds to data points that satisfy 0 < y < 0.5
and category 2 corresponds to data points that satisfy 0.5 < y < 1
. To find the corresponding vector of categories, a naive looping approach would be
nc <- length(t) - 1 # number of categories
categories <- numeric(length=length(y)) # vector of categories
for(cc in 1:nc){ # loop over categories
lower <- t[cc] # lower bound for category cc
upper <- t[cc + 1] # upper bound for category cc
cc.log <- (lower < y) & (y < upper) # logical vector where y satisfies thresholds
categories[cc.log] <- cc # assign active category where thresholds are satisfied
}
Is there an easier and scalable solution that takes as inputs the data vector y
as well as the threshold vector t
and returns the vector of categories categories
?
Edit: Choosing akrun's solution as it is the fastest.
Unit: microseconds
expr min lq mean median uq max neval
akrun(y, t) 352.386 357.7325 382.8909 369.4925 380.1840 1295.361 100
darren(y, t) 520.882 545.2580 600.2583 602.9905 639.5555 886.097 100
myself(y, t) 11261.807 11415.7625 12403.3405 11653.3235 13218.9600 20399.890 100