I need to count the number of rows different to a specific row, based on column values, while avoiding loop solutions.
Example: Take a data.table
object A
with four lines where each line represents an individual.
Each individual has 3 values v1, v2, v3
. The objective is to count, for a specific individiual (row), how many other individuals have a strictly larger (or equal) value for all three variables, counted in the variable count
.
A loop version that does the job could be:
A = data.table(matrix(0, nrow = 4, ncol = 3))
colnames(A) <- c("v1","v2","v3")
# Assign values for variables v1, v2, v3
A[1,1] <- 1; A[1,2] <- 1; A[1,3] <- 1
A[2,1] <- 1; A[2,2] <- 1.5; A[2,3] <- 1
A[3,1] <- 0.9; A[3,2] <- 0.5; A[3,3] <- 0.8
A[4,1] <- 2; A[4,2] <- 1.5; A[4,3] <- 2
# Count variable
A$count = NA
for(j in 1:nrow(A)){
A$count[j] = 0
A$count[j] = 0
for(k in 1:nrow(A)){
# Compares the value(s) of individual j relatively to individual k
if( ( A$v1[j] < A$v1[k] ) & ( A$v2[j] < A$v2[k] ) & ( A$v3[j] < A$v3[k] ) ){
A$count[j] = A$count[j]+1 } # if condition above is fullfilled, add 1
}
}
Which yields the result:
v1 v2 v3 count
1: 1.0 1.0 1.0 1
2: 1.0 1.5 1.0 0
3: 0.9 0.5 0.8 3
4: 2.0 1.5 2.0 0
Ideas how to avoid the time intensive loop by obtaining the same result?