3

My problem is similar to this previous question Fastest way to add rows for missing values in a data.frame?

I can't figure out how to add rows padded with "NA" when the min/max is different by group.

> red<-data.frame(project = c(6, 6, 6, 6, 6, 9, 9, 9), period =c(1, 2, 5:7, 2, 4, 5), v3=letters[1:8], v4=c("red", "yellow", recursive = T))
> red
  project period v3     v4
1       6      1  a    red
2       6      2  b yellow 
3       6      5  c    red
4       6      6  d yellow
5       6      7  e    red
6       9      2  f yellow
7       9      4  g    red
8       9      5  h yellow

I want it to look like:

project period v3     v4
      6      1  a    red
      6      2  b yellow
      6      3 NA     NA
      6      4 NA     NA
      6      5  c    red
      6      6  d yellow
      6      7  e    red
      9      2  f yellow
      9      3 NA     NA
      9      4  g    red
      9      5  h yellow

When I used

library(data.table)
DT=as.data.table(red)
setkey(DT, project, period)

DT[CJ(unique(project), seq(min(period), max(period)))]

it made each project group have 7 periods; Project 6 should have periods 1-7, but Project 9 should have periods 2-5.

I've tried fiddling with .SD[ which.max(period)], by=project] but no cigar.

I thought it should be something simple in the seq(), but I tried seq(min(period, by=project)) with no luck

Thank you!

Community
  • 1
  • 1
MiamiCG
  • 135
  • 1
  • 8

3 Answers3

3
DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]
#    project period v3     v4
# 1:       6      1  a    red
# 2:       6      2  b yellow
# 3:       6      3 NA     NA
# 4:       6      4 NA     NA
# 5:       6      5  c    red
# 6:       6      6  d yellow
# 7:       6      7  e    red
# 8:       9      2  f yellow
# 9:       9      3 NA     NA
#10:       9      4  g    red
#11:       9      5  h yellow
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Also, for those of you reading this who are new at this like me, I had trouble exporting the table to include the N/A rows. I had to make the command `DT=DT[setkey...` and then `write.table(DT, "file.txt")` – MiamiCG Jan 21 '15 at 21:28
  • 1
    Thank you both! Each answer worked. I had some replicated values in my "Period" column, so I had to use "allow.cartesian = TRUE": `DT[setkey(DT[,min(period):max(period), by = project], project, V1), allow.cartesian=TRUE]` – MiamiCG Jan 21 '15 at 22:54
  • 1
    @AdrienA. fixed – eddi Feb 09 '18 at 22:40
3

I don't know if this the idiomatic way or not, but I was able to achieve your desired output, by first creating an index and then subsetting the correct rows out of .SD per that index

DT[, indx := .GRP, project][, 
     .SD[CJ(unique(project), seq(min(period), max(period)))], indx]

#     indx project period v3     v4
#  1:    1       6      1  a    red
#  2:    1       6      2  b yellow
#  3:    1       6      3 NA     NA
#  4:    1       6      4 NA     NA
#  5:    1       6      5  c    red
#  6:    1       6      6  d yellow
#  7:    1       6      7  e    red
#  8:    2       9      2  f yellow
#  9:    2       9      3 NA     NA
# 10:    2       9      4  g    red
# 11:    2       9      5  h yellow
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
3

The accepted answer does not work (anymore?), but it is close.

setkey(DT,project,period)
DT[setkey(DT[, .(min(period):max(period)), by = project], project, V1)]

Note: 1. you need to make the period sequence into the list to work. 2. @MiamiCG, I am guessing you needed to allow cartesian because of not keying the table first. If you set it to TRUE, there will be no error message, but the result will not be correct.

Update: @eddi has updated his answer to match mine, so it is working.

BBB
  • 150
  • 1
  • 11