How to store the row number by group

Question

I have data, stored as a data.table dt or a data.frame df, with multiple observations per id-month combination. I want to store the row number in a variable, let's call it row.

I know how to do this in dplyr but want to learn how to do it in (pure) data.table. I assume it is a trivially easy operation, but I can't seem to find a solution that works.

Reprex:

library(dplyr)
library(data.table)

df <- data_frame(id = c(1, 1, 1, 2, 2, 2), month = c(1, 1, 2, 1, 1, 2))
dt <- data.table(df)

My dplyr solution gives the expected output:

df %>%
  group_by(id, month) %>%
  mutate(row = row_number(id))

# A tibble: 6 x 3
# Groups:   id, month [4]
     id month   row
  <dbl> <dbl> <int>
1     1     1     1
2     1     1     2
3     1     2     1
4     2     1     1
5     2     1     2
6     2     2     1

Doing similar operations on a data.table yields something different:

dt[, row := row_number(id), by = c("id", "month")]

   id month row
1:  1     1   1
2:  1     1   1
3:  1     2   1
4:  2     1   1
5:  2     1   1
6:  2     2   1

Or:

dt[, row := .I, by = c("id", "month")]

   id month row
1:  1     1   1
2:  1     1   2
3:  1     2   3
4:  2     1   4
5:  2     1   5
6:  2     2   6

I assume I understand why this happens (row_number(id) simply refers to the first row number of the first row of each group), but do not know how to get the expected result in pure data.table.

Nice. Thanks @Frank. I was wondering what the corresponding one was — Sotos, Sep 25 '17 at 14:14

score 0 · Accepted Answer · answered Sep 25 '17 at 14:14

0

dt[, row := row.names(dt), by = c("id", "month")]
dt

   id month row
1:  1     1   1
2:  1     1   2
3:  1     2   1
4:  2     1   1
5:  2     1   2
6:  2     2   1

answered Sep 25 '17 at 14:14

conor

1,267
10
7

Better answers than mine are in the comments - I didn't see these when responding :) – conor Sep 25 '17 at 14:24

How to store the row number by group

1 Answers1