2

I have a data.table like so:

id       date
02 2020-08-27
02 2020-09-07
04 2020-08-27
07 2020-08-27
07 2020-08-27
19 2020-08-28
19 2020-09-07
19 2020-09-07

I want to add a column that will be a sequential count based on the date group within each id. So, the rows with the earliest date within each id will be 1, then 2 for the next later date, and so on.

The result would appear like this:

id       date sequence
02 2020-08-27        1
02 2020-09-07        2
04 2020-08-27        1
07 2020-08-27        1
07 2020-08-27        1
19 2020-08-28        1
19 2020-09-07        2
19 2020-09-07        2

Data:

structure(list(id = c("02", "02", "04", "07", "07", "19", "19", 
                  "19"), date = structure(c(18501, 18512, 18501, 18501, 18501, 
                                            18502, 18512, 18512), class = "Date")), row.names = c(NA, -8L
                                            ), class = c("data.table", "data.frame"))
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
moman822
  • 1,904
  • 3
  • 19
  • 33

2 Answers2

2

We can use frank

library(data.table)
df1[, Sequence := frank(date, ties.method = 'dense'), id]
df1
#   id       date Sequence
#1: 02 2020-08-27        1
#2: 02 2020-09-07        2
#3: 04 2020-08-27        1
#4: 07 2020-08-27        1
#5: 07 2020-08-27        1
#6: 19 2020-08-28        1
#7: 19 2020-09-07        2
#8: 19 2020-09-07        2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This seems to give the Seq based on the nth row the new date appears within the group, rather than sequentially 1->2->3. See changing the 7th row: `df1[7, date:=as.Date("2020-08-28")]` and re-running, `Sequence` is now 3 instead of 2 – moman822 Sep 07 '20 at 21:59
  • @moman822 It is based on the ranking of the 'date' column (which seems to be your question as described) – akrun Sep 07 '20 at 22:00
  • @moman822 You can change the ties.method to `'dense'`. if the same dates should fall into same group – akrun Sep 07 '20 at 22:03
  • `dense` works, thanks. (I'm trying to get "Earliest date in the group"=1, "Second earliest"=2, "Third earliest"=3, and so on.) – moman822 Sep 07 '20 at 22:04
  • @moman822 in that case, you can use `-` – akrun Sep 07 '20 at 22:05
  • @moman822 i.e. may be `df1[, Sequence := frank(-as.numeric(date), ties.method = 'dense'), id]` – akrun Sep 07 '20 at 22:06
1

Here is a base R option using ave

within(
  df,
  sequence <- ave(as.numeric(date), id, FUN = function(x) cumsum(Negate(duplicated)(x)))
)

which gives

  id       date sequence
1 02 2020-08-27        1
2 02 2020-09-07        2
3 04 2020-08-27        1
4 07 2020-08-27        1
5 07 2020-08-27        1
6 19 2020-08-28        1
7 19 2020-09-07        2
8 19 2020-09-07        2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81