0

In the tsibble package and fable package, I think I read somewhere that we can handle irregular time series. I could not find anything with examples on how to do it. Some questions I have are:

  1. Do I have to convert irregular timeseries to a regular one before I can model? (So far what I know is that we need to convert irregular time series to a regular one. Please let me know if its is not the case ? and if not then what are some models that do not need regular time series?)
  2. What are the tools and models in tidyverts/tsibble/ fable /fabletools to handle irregular timeseries?

Are there any questions/ links where I can see a working example ? e.g. This question uses zoo/xts to handle it.

I saw some capabilities related to that in zoo/xts, which is always good but I am spinning my wheels on fable and trying to get it to work.

for a sample dataset we can use

    DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
    WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
    5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
-10L))
ok1more
  • 779
  • 6
  • 15

1 Answers1

1

Most models available in {fable} require the observations to be regular, and a lot of models also require that there are no gaps in the data. An example model which supports irregular data is fable::TSLM().

The above example data is typically considered 'regular' but with gaps. This is because the data has a common interval of 1 month, however some months are missing in the data. Here is how a tsibble for this data can be produced:

DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                 2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
                                 ), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
                                               "01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
                     WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
                                   5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
                                                                                     "Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
                                                                                                                                               -10L))

# Fix $Time to a valid yearmonth index variable
library(tsibble)
library(dplyr)
DF <- DF %>% 
  mutate(Time = yearmonth(as.Date(format(Time), format = "%d-%m-%Y")))
DF
#>    station     Time WaterTemp
#> 1        1 1974 Jan  5.000000
#> 2        1 1974 Feb  5.000000
#> 3        1 1974 Mar  8.600000
#> 4        1 1974 May  8.133333
#> 5        1 1974 Jul 12.800000
#> 6        2 1974 Jan  5.000000
#> 7        2 1974 Feb  5.000000
#> 8        2 1974 Apr  8.600000
#> 9        2 1974 Jun  8.133333
#> 10       2 1974 Aug 12.800000

# Create a 'regular' tsibble (with gaps)
as_tsibble(DF, key = "station", index = "Time")
#> # A tsibble: 10 x 3 [1M]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 May      8.13
#>  5       1 1974 Jul     12.8 
#>  6       2 1974 Jan      5   
#>  7       2 1974 Feb      5   
#>  8       2 1974 Apr      8.60
#>  9       2 1974 Jun      8.13
#> 10       2 1974 Aug     12.8

To fill in the gaps of this dataset - similarly to what is shown in the linked question - you can use the tsibble::fill_gaps() function. This makes the data compatible with models that support missing values, but don't support gaps in the data such as fable::ARIMA().

# Create a 'regular' tsibble (with gaps) then complete the gaps
as_tsibble(DF, key = "station", index = "Time") %>% 
  fill_gaps()
#> # A tsibble: 15 x 3 [1M]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 Apr     NA   
#>  5       1 1974 May      8.13
#>  6       1 1974 Jun     NA   
#>  7       1 1974 Jul     12.8 
#>  8       2 1974 Jan      5   
#>  9       2 1974 Feb      5   
#> 10       2 1974 Mar     NA   
#> 11       2 1974 Apr      8.60
#> 12       2 1974 May     NA   
#> 13       2 1974 Jun      8.13
#> 14       2 1974 Jul     NA   
#> 15       2 1974 Aug     12.8

An irregular time series can be created using regular = FALSE. This is typically useful if you're working with event data. In this case you would rarely want to fill the gaps, because there are so many.

# Create an 'irregular' tsibble (no concept of gaps)
as_tsibble(DF, key = "station", index = "Time", regular = FALSE)
#> # A tsibble: 10 x 3 [!]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 May      8.13
#>  5       1 1974 Jul     12.8 
#>  6       2 1974 Jan      5   
#>  7       2 1974 Feb      5   
#>  8       2 1974 Apr      8.60
#>  9       2 1974 Jun      8.13
#> 10       2 1974 Aug     12.8

Created on 2021-02-09 by the reprex package (v0.3.0)

  • Thanks for the information so far. My data is irregular, sensor readings at different times of the day at irregular intervals. I could not post that data and did not find a good example dataset to post here. I did use `regular = FALSE` but then none of the models worked.. and you are right filling the gaps in that had NA for every hour except couple of reading everyday. – ok1more Feb 09 '21 at 13:32
  • I looked through you blog, it was really helpful. Thanks a lot for great work. Can I request you to please add some examples of how to merge the forecasts back to the original dataset elegantly. Plotting them is good but also need them as tables. I was able to get it by using `hilo` to convert my fbl_ts to a `tbl_ts`, since `dplyr::select` did not work as expected on `fbl_ts` .but I believe there is an elegant way to do it. – ok1more Feb 09 '21 at 13:37
  • For irregular sensor data, setting `regular = FALSE` is appropriate. However this limits the models that are suitable for this type of data. Currently `fable::TSLM()` is the only fable model which will work with irregular data. You might also consider some methods to make your data regular if appropriate (aggregation, more consistent data collection, approximation, etc.) to use other time series models. Your second comment sounds like a different question, you can open a new Stack Overflow question for this. – Mitchell O'Hara-Wild Feb 09 '21 at 22:49