1

I have a data.table that includes a displayDate field and also many rows per user, each with a different displayDate. Each user joined the service and started logging data at a different point in time. For each user I want to find the first week's worth of data only, so I'd like to get rid of rows that are not within the first week for a given user. Here's what I'd like to do, but this produces an error:

early_data = dt[displayDate <= min(displayDate) + 7, , by=user]

And I get the following error:

Error in `[.data.table`(dt, displayDate <= min(displayDate) + 14, , by = user) : 
  'by' or 'keyby' is supplied but not j

Is there a way to conditionally select rows by grouping on another column? What's wrong with my syntax above?

helloB
  • 3,472
  • 10
  • 40
  • 87
  • You could also do: `dt[, .SD[displayDate <= min(displayDate) + 7], by=user]` – Jaap Jan 17 '16 at 07:25
  • for the record, this use case is described as improvement in [data.table#1105](https://github.com/Rdatatable/data.table/issues/1105) – jangorecki Jan 17 '16 at 13:02

1 Answers1

3

We may need to get the row index (.I) from the logical vector and use that to subset the rows.

 dt[dt[, .I[displayDate <= min(displayDate)+ 7], by =user]$V1]

One problem with the OP's code is that we are only using the i along with the by option and in between there are only commas (, ,) and no j is provided (as mentioned in the error)

data

set.seed(24)
dt <- data.table(displayDate = sample(seq(as.Date("2014-07-01"),
            length.out=20, by = "1 day")), user = rep(1:4, each=5))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662