0

Say that I have a dataset like this one:

 example <- data.table(Object = rep(LETTERS[1:3], each=3), date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), date_data =1:9)

example
   Object       date date_data
1:      A       <NA>         1
2:      A       <NA>         2
3:      A 2020-01-01         3
4:      B       <NA>         4
5:      B       <NA>         5
6:      B 2020-01-01         6
7:      C       <NA>         7
8:      C       <NA>         8
9:      C 2020-01-01         9

I would like to set all date_data within a certain group equal to the last date_data value for that group. So, the desired output is this:

   Object       date date_data
1:      A       <NA>         3
2:      A       <NA>         3
3:      A 2020-01-01         3
4:      B       <NA>         6
5:      B       <NA>         6
6:      B 2020-01-01         6
7:      C       <NA>         9
8:      C       <NA>         9
9:      C 2020-01-01         9

Now, I have managed to get exactly what I need using example[, date_data:= .SD[.N]$date_data, by = "Object"]. The problem is that I want to make such a call in a loop iterating over a large data table. Calling .SD each time is too slow. Ideally, the code would use .I (like here) or some other data.table optimized features that I don't know about. I didn't manage to figure out the right way to do this.

Any ideas?

Djpengo
  • 379
  • 2
  • 14

1 Answers1

3

You can use data.table::last:

library(data.table)

example <- data.table(
  Object = rep(LETTERS[1:3], each=3),
  date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), 
  date_data =1:9
)
example[, date_data := last(date_data), by =  Object ]
example
#    Object       date date_data
#    <char>     <Date>     <int>
# 1:      A       <NA>         3
# 2:      A       <NA>         3
# 3:      A 2020-01-01         3
# 4:      B       <NA>         6
# 5:      B       <NA>         6
# 6:      B 2020-01-01         6
# 7:      C       <NA>         9
# 8:      C       <NA>         9
# 9:      C 2020-01-01         9

But i don't know if there's much optimisation, otherwise you can just use .N on the variable:

example[, date_data := date_data[.N], by =  Object]
example
Victorp
  • 13,636
  • 2
  • 51
  • 55
  • Thanks a lot, this was so simple, I'm ashamed that I even asked! Definitely a lot faster (and actually more obvious solution) than .SD. Sometimes it's good to take a break... – Djpengo May 04 '20 at 15:20
  • data.table is very powerful, there's always different ways to achieve the same goal, hard to have all possibilities in mind :) – Victorp May 04 '20 at 15:38