1

I have found that inside data.table(), order function enumerates rows by groups, while the original idea is to see rank of each observation inside specified groups.

Here is a reproducable example:

require(data.table)
N <- 10

set.seed(1)

test <- data.table(
  a = round(rnorm(N,mean=0, sd = 30),0),
  b = c(rep('group_1', N/2 ),rep('group_2', N/2))
)
test <- test[, item_position := order(a, decreasing = T), by=list(b)]
setkey(test, b, item_position)
View(test)

The result (as I get it):

test
      a       b item_position
 1:  48 group_1             1
 2: -25 group_1             2
 3:  10 group_1             3
 4: -19 group_1             4
 5:   6 group_1             5
 6:  -9 group_2             1
 7:  22 group_2             2
 8: -25 group_2             3
 9:  15 group_2             4
10:  17 group_2             5

Which is obviously wrong. What am I doing wrong, and how can I use order() inside data.table?

Thank you!

Loiisso
  • 161
  • 1
  • 6

2 Answers2

1

I think you have a bit of a misunderstanding of what order does. From everything you describe, you're actually looking for rank:

test[, B_S := rank(-a, ties.method="first"), by = b][] ## Big to Small
#       a       b B_S
#  1: -19 group_1   4
#  2:   6 group_1   3
# .. SNIP ..
#  9:  17 group_2   2
# 10:  -9 group_2   4
test[, S_B := rank(a, ties.method="first"), by = b][]  ## Small to big
#       a       b B_S S_B
#  1: -19 group_1   4   2
#  2:   6 group_1   3   3
# .. SNIP ..
#  9:  17 group_2   2   4
# 10:  -9 group_2   4   2
setkey(test, b, S_B)
test
#       a       b B_S S_B
#  1: -25 group_1   5   1
#  2: -19 group_1   4   2
#  3:   6 group_1   3   3
#  4:  10 group_1   2   4
#  5:  48 group_1   1   5
#  6: -25 group_2   5   1
#  7:  -9 group_2   4   2
#  8:  15 group_2   3   3
#  9:  17 group_2   2   4
# 10:  22 group_2   1   5

There was nothing wrong with the order output (except that it wasn't what you expected). Consider the following:

x <- c(-19, 6, -25, 48, 10)
order(x, decreasing=TRUE)
# [1] 4 5 2 1 3
cbind(x, order(x, decreasing=TRUE))
#        x  
# [1,] -19 4
# [2,]   6 5
# [3,] -25 2
# [4,]  48 1
# [5,]  10 3

This is exactly the same as what you were getting in your data.table answer. To view more about the order function, check out this Q and A set: Understanding the order() function

Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

Ananda's solution is the way to go for smaller datasets. For larger ones, where speed becomes an issue, you'll probably want to use data.table's setkey instead:

test[, idx := .I]            # save index to reorder later
setkey(test, b, a)           # order the way we want
test[, pos := 1:.N, by = b]  # save the positions per group
setkey(test, idx)            # back to original order
eddi
  • 49,088
  • 6
  • 104
  • 155
  • you can probably also use the internal fast order functions directly (instead of resorting to `setkey`), but I don't really know which of those functions to use when – eddi Feb 18 '14 at 17:04