2

I'm trying to fill in NA values with numbers that show exponential growth. Below is a data sample of what I'm trying to do.


library(tidyverse)

expand.grid(X2009H1N1 = "0-17 years",
            type = "Cases",
            month = seq(as.Date("2009-04-12") , to = as.Date("2010-03-12"), by = "month")) %>% 
  bind_cols( data.frame(
    MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
    lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)
  ))

I have used %>% arrange(month, X2009H1N1) %>% group_by(X2009H1N1, type ) %>% mutate(aprox_MidLevelRange = zoo::na.approx(MidLevelRange, na.rm = FALSE)) but the result does not look exponential to me. Thanks

user3357059
  • 1,122
  • 1
  • 15
  • 30

3 Answers3

1

Sure your result is not exponential, you are using a function na.approx() to impute the values using linear interpolation. The zoo package you are using offers to interpolate using cubic spline interpolation using na.spline() function, but this function does not produce exponential curve either.

x <- expand.grid(X2009H1N1 = "0-17 years",
                 type = "Cases",
                 month = seq(as.Date("2009-04-12"), 
                             to = as.Date("2010-03-12"), 
                             by = "month")) %>% 
  bind_cols(data.frame(MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
                       lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)))

x %>% arrange(month, X2009H1N1) %>% 
  group_by(X2009H1N1, type) %>% 
  mutate(aprox_MidLevelRange = zoo::na.spline(MidLevelRange))

The problem with cubic spline interpolation is that your lowest values will be interpolated as negative, depends whether this is a behavior you are looking for or not:

# A tibble: 8 x 6
# Groups:   X2009H1N1, type [1]
  X2009H1N1  type  month      MidLevelRange   lowEst aprox_MidLevelRange
  <fct>      <fct> <date>             <dbl>    <dbl>               <dbl>
1 0-17 years Cases 2009-04-12             0        0                  0 
2 0-17 years Cases 2009-05-12            NA       NA          -18568160.
3 0-17 years Cases 2009-06-12            NA       NA          -25223342.
4 0-17 years Cases 2009-07-12            NA       NA          -22929832.
5 0-17 years Cases 2009-08-12            NA       NA          -14651914.
6 0-17 years Cases 2009-09-12            NA       NA           -3353875.
7 0-17 years Cases 2009-10-12       8000000  5000000            8000000.
knytt
  • 583
  • 5
  • 15
1

Have a look at the imputeTS package. It offers plenty of imputation functions for time series. Take a look at this paper to get a good overview of all offered options

In your case using Stineman interpolation ( imputeTS::na_interpolation(x, option ="stine") could maybe be a suitable option.

Here for the example you provided:

x <- expand.grid(
  X2009H1N1 = "0-17 years",
  type = "Cases",
  month = seq(as.Date("2009-04-12"),
    to = as.Date("2010-03-12"),
    by = "month"
  )
) %>%
  bind_cols(data.frame(
    MidLevelRange = c(0, NA, NA, NA, NA, NA, 8000000, 16000000, 18000000, 19000000, 19000000, 19000000),
    lowEst = c(0, NA, NA, NA, NA, NA, 5000000, 12000000, 12000000, 13000000, 14000000, 14000000)
  ))

x %>%
  arrange(month, X2009H1N1) %>%
  group_by(X2009H1N1, type) %>%
  mutate(aprox_MidLevelRange = imputeTS::na_interpolation(MidLevelRange, option = "stine"))

This gives you:

# A tibble: 12 x 6
# Groups:   X2009H1N1, type [1]
   X2009H1N1  type  month      MidLevelRange   lowEst aprox_MidLevelRange
   <fct>      <fct> <date>             <dbl>    <dbl>               <dbl>
 1 0-17 years Cases 2009-04-12             0        0                  0 
 2 0-17 years Cases 2009-05-12            NA       NA             593718.
 3 0-17 years Cases 2009-06-12            NA       NA            1335612.
 4 0-17 years Cases 2009-07-12            NA       NA            2289061.
 5 0-17 years Cases 2009-08-12            NA       NA            3559604.
 6 0-17 years Cases 2009-09-12            NA       NA            5336975.
 7 0-17 years Cases 2009-10-12       8000000  5000000            8000000 
 8 0-17 years Cases 2009-11-12      16000000 12000000           16000000 
 9 0-17 years Cases 2009-12-12      18000000 12000000           18000000 
10 0-17 years Cases 2010-01-12      19000000 13000000           19000000 
11 0-17 years Cases 2010-02-12      19000000 14000000           19000000 
12 0-17 years Cases 2010-03-12      19000000 14000000           19000000 

So just comparing interpolation functions I guess this could be the best option.

Just plot yourself the different interpolation options, to see the differences. In general this are the interpolation options:

imputeTS::na_interpolation(x, option ="linear")
imputeTS::na_interpolation(x, option ="spline")
imputeTS::na_interpolation(x, option ="stine")

linear / spline options from imputeTS are the same as zoo::approx()/ zoo::spline(). stine does not exist in zoo.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
1

I do not think that exponential growth can come from zero. Should the question be reframed?

The following method does produce an exponential fill. The idea is that exponential growth is linear on a log-scale. So you can log-transform the data (which only works for strictly positive series), apply linear interpolation, and then transform back to the exponential scale.

The worked example below begins the series with 0.001 instead of 0.

x <- expand.grid(X2009H1N1 = "0-17 years",
                 type = "Cases",
                 month = seq(as.Date("2009-04-12"), 
                             to = as.Date("2010-03-12"), 
                             by = "month")) %>% 
bind_cols(data.frame(MidLevelRange = c(0.001,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
                       lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)))
x<-x %>% arrange(month, X2009H1N1) %>% 
  group_by(X2009H1N1, type) %>% 
  mutate(aprox_MidLevelRange = exp(na.approx(log(MidLevelRange))))

This produces:

# A tibble: 12 × 6
# Groups:   X2009H1N1, type [1]
   X2009H1N1  type  month      MidLevelRange   lowEst aprox_MidLevelRange
   <fct>      <fct> <date>             <dbl>    <dbl>               <dbl>
 1 0-17 years Cases 2009-04-12         0.001        0              0.001 
 2 0-17 years Cases 2009-05-12        NA           NA              0.0447
 3 0-17 years Cases 2009-06-12        NA           NA              2.00  
 4 0-17 years Cases 2009-07-12        NA           NA             89.4   
 5 0-17 years Cases 2009-08-12        NA           NA           4000.    
 6 0-17 years Cases 2009-09-12        NA           NA         178885.    
 7 0-17 years Cases 2009-10-12   8000000      5000000        8000000     
 8 0-17 years Cases 2009-11-12  16000000     12000000       16000000     
 9 0-17 years Cases 2009-12-12  18000000     12000000       18000000     
10 0-17 years Cases 2010-01-12  19000000     13000000       19000000.    
11 0-17 years Cases 2010-02-12  19000000     14000000       19000000.    
12 0-17 years Cases 2010-03-12  19000000     14000000       19000000.    
Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
Stephen
  • 473
  • 4
  • 11