I have data, stored as a data.table dt
or a data.frame df
, with multiple observations per id
-month
combination. I want to store the row number in a variable, let's call it row
.
I know how to do this in dplyr but want to learn how to do it in (pure) data.table. I assume it is a trivially easy operation, but I can't seem to find a solution that works.
Reprex:
library(dplyr)
library(data.table)
df <- data_frame(id = c(1, 1, 1, 2, 2, 2), month = c(1, 1, 2, 1, 1, 2))
dt <- data.table(df)
My dplyr solution gives the expected output:
df %>%
group_by(id, month) %>%
mutate(row = row_number(id))
# A tibble: 6 x 3
# Groups: id, month [4]
id month row
<dbl> <dbl> <int>
1 1 1 1
2 1 1 2
3 1 2 1
4 2 1 1
5 2 1 2
6 2 2 1
Doing similar operations on a data.table yields something different:
dt[, row := row_number(id), by = c("id", "month")]
id month row
1: 1 1 1
2: 1 1 1
3: 1 2 1
4: 2 1 1
5: 2 1 1
6: 2 2 1
Or:
dt[, row := .I, by = c("id", "month")]
id month row
1: 1 1 1
2: 1 1 2
3: 1 2 3
4: 2 1 4
5: 2 1 5
6: 2 2 6
I assume I understand why this happens (row_number(id)
simply refers to the first row number of the first row of each group), but do not know how to get the expected result in pure data.table.