Counting number of rows "smaller" than other rows while avoiding loop

Question

I need to count the number of rows different to a specific row, based on column values, while avoiding loop solutions.

Example: Take a data.tableobject A with four lines where each line represents an individual. Each individual has 3 values v1, v2, v3. The objective is to count, for a specific individiual (row), how many other individuals have a strictly larger (or equal) value for all three variables, counted in the variable count. A loop version that does the job could be:

A = data.table(matrix(0,  nrow = 4, ncol = 3))

colnames(A) <- c("v1","v2","v3")

# Assign values for variables v1, v2, v3
A[1,1] <- 1; A[1,2] <- 1; A[1,3] <- 1
A[2,1] <- 1; A[2,2] <- 1.5; A[2,3] <- 1
A[3,1] <- 0.9; A[3,2] <- 0.5; A[3,3] <- 0.8
A[4,1] <- 2; A[4,2] <- 1.5; A[4,3] <- 2

# Count variable 
A$count = NA

for(j in 1:nrow(A)){
  A$count[j] = 0
  A$count[j] = 0    
  for(k in 1:nrow(A)){
    # Compares the value(s) of individual j relatively to individual k
    if( ( A$v1[j] < A$v1[k] ) & ( A$v2[j] < A$v2[k] ) & ( A$v3[j] < A$v3[k] ) ){
      A$count[j] = A$count[j]+1    } # if condition above is fullfilled, add 1 
    
  }
}

Which yields the result:

    v1  v2  v3 count
1: 1.0 1.0 1.0     1
2: 1.0 1.5 1.0     0
3: 0.9 0.5 0.8     3
4: 2.0 1.5 2.0     0

Ideas how to avoid the time intensive loop by obtaining the same result?

score 6 · Answer 1 · answered Sep 16 '21 at 21:59

data.table has a special symbol .EACHI which allows for grouping by each i in a join. So, we can join and aggregate simultaneously, even in a non-equi self join:

A[A, on = .(v1>v1, v2>v2, v3>v3), .N, by = .EACHI]

    v1  v2  v3 N
1: 1.0 1.0 1.0 1
2: 1.0 1.5 1.0 0
3: 0.9 0.5 0.8 3
4: 2.0 1.5 2.0 0

For a more detailed explanation of .EACHI, please, see this answer .

Waldi · Accepted Answer · 2021-09-16T13:38:46.507

2

You could use a self non-equi join:

A[A,.(v1=i.v1,v2=i.v2,v3=i.v3,x.v1),on=.(v1>v1,v2>v2,v3>v3)][
  ,.(count=sum(!is.na(x.v1))),by=.(v1,v2,v3)]

    v1  v2  v3 count
1: 1.0 1.0 1.0     1
2: 1.0 1.5 1.0     0
3: 0.9 0.5 0.8     3
4: 2.0 1.5 2.0     0

edited Sep 16 '21 at 13:38

answered Sep 16 '21 at 13:33

Waldi

39,242
6
30
78

Counting number of rows "smaller" than other rows while avoiding loop

2 Answers2