1

I have data, stored as a data.table dt or a data.frame df, with multiple observations per id-month combination. I want to store the row number in a variable, let's call it row.

I know how to do this in dplyr but want to learn how to do it in (pure) data.table. I assume it is a trivially easy operation, but I can't seem to find a solution that works.

Reprex:

library(dplyr)
library(data.table)

df <- data_frame(id = c(1, 1, 1, 2, 2, 2), month = c(1, 1, 2, 1, 1, 2))
dt <- data.table(df)

My dplyr solution gives the expected output:

df %>%
  group_by(id, month) %>%
  mutate(row = row_number(id))

# A tibble: 6 x 3
# Groups:   id, month [4]
     id month   row
  <dbl> <dbl> <int>
1     1     1     1
2     1     1     2
3     1     2     1
4     2     1     1
5     2     1     2
6     2     2     1

Doing similar operations on a data.table yields something different:

dt[, row := row_number(id), by = c("id", "month")]

   id month row
1:  1     1   1
2:  1     1   1
3:  1     2   1
4:  2     1   1
5:  2     1   1
6:  2     2   1

Or:

dt[, row := .I, by = c("id", "month")]

   id month row
1:  1     1   1
2:  1     1   2
3:  1     2   3
4:  2     1   4
5:  2     1   5
6:  2     2   6

I assume I understand why this happens (row_number(id) simply refers to the first row number of the first row of each group), but do not know how to get the expected result in pure data.table.

Sotos
  • 51,121
  • 6
  • 32
  • 66

1 Answers1

0
dt[, row := row.names(dt), by = c("id", "month")]
dt

   id month row
1:  1     1   1
2:  1     1   2
3:  1     2   1
4:  2     1   1
5:  2     1   2
6:  2     2   1
conor
  • 1,267
  • 10
  • 7