3

I have a data table with multiple factors, for example:

dt <- data.table(station=c(1,1,2,2,3), station.type=c("X","X","Y","Y","Y"), stage=c("A","B","A","B","A"), value=10:14)

   station station.type stage value
1:       1            X     A    10
2:       1            X     B    11
3:       2            Y     A    12
4:       2            Y     B    13
5:       3            Y     A    14

Each station is associated with a type (My actual data has over 50 stations and 10 types). In the example, the combination station 3 / stage B is missing. I want to add rows for the missing combinations, while retaining the type associated with the station.

I started from Matt Dowle's answer to this question: Fastest way to add rows for missing values in a data.frame?

setkey(dt, station, stage)
dt[CJ(station, stage, unique=TRUE)]

   station station.type stage value
1:       1            X     A    10
2:       1            X     B    11
3:       2            Y     A    12
4:       2            Y     B    13
5:       3            Y     A    14
6:       3           NA     B    NA

But then I have to do another merge with the original data table to fill in the type for each station.

Is there a way to it all in one line - something like:

dt[CJ(cbind(station, station.type), stage, unique=TRUE)]

(of course this doesn't work because CJ takes vectors as arguments)

Community
  • 1
  • 1
KTWillow
  • 297
  • 1
  • 6
  • 5
    perhaps `dt[, .SD[.(stage=c("A", "B")), on="stage"], by=.(station, station.type)]`? – Arun Mar 03 '16 at 16:18
  • 1
    or with the `na.locf` function from the `zoo` package: `dt[CJ(station, stage, unique=TRUE)][, station.type := na.locf(station.type)]` – Jaap Mar 03 '16 at 16:30
  • 3
    @Arun, thank you, can you post this as an answer so I can accept it? – KTWillow Mar 03 '16 at 16:46

1 Answers1

4

Here's one way:

dt[, .SD[.(stage=c("A", "B")), on="stage"], by=.(station, station.type)]
Arun
  • 116,683
  • 26
  • 284
  • 387