My question is very similar to this one: How to extract the first n rows per group?
dt
date age name val
1: 2000-01-01 3 Andrew 93.73546
2: 2000-01-01 4 Ben 101.83643
3: 2000-01-01 5 Charlie 91.64371
4: 2000-01-02 6 Adam 115.95281
5: 2000-01-02 7 Bob 103.29508
6: 2000-01-02 8 Campbell 91.79532
We have a dt
and I've added an extra column named val
. First, we want to extract the first n rows within each group.
The solutions from the link provided are:
dt[, .SD[1:2], by=date] # where 1:2 is the index needed
dt[dt[, .I[1:2], by = date]$V1] # for speed
My question is how do I apply a function to the first n rows within each group if that function depends on the subsetted information. I am trying to apply something like this:
# uses other columns for results/ is dependent on subsetted rows
# but keep it simple for replication
do_something <- function(dt){
res <- ifelse(cumsum(dt$val) > 200, 1, 0)
return(res)
}
# first 2 rows of dt by group=date
x <- dt[, .SD[1:2], by=date]
# apply do_something to first 2 rows of dt by group=date
x[, list('age'=age,'name'=name,'val'=val, 'funcVal'= do_something(.SD[1:2])),by=date]
date age name val funcVal
1: 2000-01-01 3 Andrew 93.73546 0
2: 2000-01-01 4 Ben 101.83643 1
3: 2000-01-02 6 Adam 115.95281 0
4: 2000-01-02 7 Bob 103.29508 1
Am I going about this wrong? Is there a more efficient way to do this? I cannot seem to figure out how to apply the "for speed" solution to this. Is there a way to do this without saving the subset-ed results first and applying the function to the first 2 rows by date right away?
Any help is appreciated and below is the code to produce the data above:
date <- c("2000-01-01","2000-01-01","2000-01-01",
"2000-01-02","2000-01-02","2000-01-02")
age <- c(3,4,5,6,7,8)
name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")
val <- val <- rnorm(6,100,10)
dt <- data.table(date, age, name,val)