2

I have a data.table with id, location and date.

library(data.table)
library(dplyr)
library(lubridate)

data<-data.table(
  id<-c(rep("1001",8),rep("1002",9),rep("1003",7)),
location<-c(rep("A",4),rep("B",4),rep("A",5),rep("B",4),rep("A",3),rep("B",4)),
 date<-c(seq(dmy('01-01-2014'),dmy('01-08-2014'), by = '1 month'),
    seq(dmy('01-04-2014'),dmy('01-12-2014'), by = '1 month'),
    seq(dmy('01-02-2015'),dmy('01-08-2015'),by = '1 month')))

names(data)<-c("id","location","date")

I want to group_by(id,location) and select say 3 nearest months before change in location and 3 nearest months after the change in location based on the values in months. There could be missing months.

My current solution is to sort and use head/tail for each group and rbind back together.

data<-data[order(id,date)]

data[location=="A",tail(.SD,3),by=id]
data[location=="B",head(.SD,3),by=id]

this will fail if there are multiple dates with same value, in which case I would like to bring back all. Can anyone suggest a better programmatic way to do this with with rleid() perhaps or other?

iboboboru
  • 1,112
  • 2
  • 10
  • 21
  • Can you show some cases when your approach fails and your desired output? – Psidom Aug 24 '16 at 15:59
  • The pedestrian solution is probably `.SD[date %in% head(unique(date), 3)]`. – eddi Aug 24 '16 at 16:15
  • Judging by your last code block and mention of rleid... maybe `data[ data[, intersect(head(.I,3), tail(.I, 3)), by=rleid(id, location)]$V1 ]` borrowed from http://stackoverflow.com/questions/16573995/subset-by-group-with-data-table/16574176#16574176 – Frank Aug 24 '16 at 16:22

0 Answers0