0

i have a data table and i would like to insert new rows imputing values between two years. This will be done over many ID groups. how do i go about replicating the data into the new rows?

# data table
dt <- data.table(ID=c(rep(1:3,each=3)), 
attrib1=rep(c("sdf","gghgf","eww"),each=3), 
attrib2=rep(c("444","222","777"),each=3), 
Year = rep(c(1990, 1995, 1996), 3), 
value = c(12,6,7,6,3,1,9,17,18))

so for all groups (ID), Year would go from 1990 to 1996 and the 2 values for 1990 & 1995 would be imputed linearly. All other attributes would remain the same and be copied into the new rows.

i've done this with a hideously long work around and attempted a custom function, but to no avail

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Sam
  • 1,400
  • 13
  • 29

1 Answers1

1

You can use tidyr::complete to expand the years and zoo::na.approx for interpolation of the values.

library(dplyr)

dt %>%
  group_by(ID, attrib1, attrib2) %>%
  tidyr::complete(Year = min(Year):max(Year)) %>%
  mutate(value = zoo::na.approx(value))

#     ID attrib1 attrib2  Year value
#   <int> <chr>   <chr>   <dbl> <dbl>
# 1     1 sdf     444      1990  12  
# 2     1 sdf     444      1991  10.8
# 3     1 sdf     444      1992   9.6
# 4     1 sdf     444      1993   8.4
# 5     1 sdf     444      1994   7.2
# 6     1 sdf     444      1995   6  
# 7     1 sdf     444      1996   7  
# 8     2 gghgf   222      1990   6  
# 9     2 gghgf   222      1991   5.4
#10     2 gghgf   222      1992   4.8
# … with 11 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213