Identifying Outlier in Timeseries data in R

Question

I have a time-series data with corresponding variable with either increase or decrease from the previous value within some range say +- 10%. There are data points within the time-series that does not go along the previous or later values in the time-series.

For example:

time       v1
13:01:30   0.689
13:01:31   0.697
13:01:32   0.701
13:01:33   0.713
**13:01:34   0.235**
13:01:35   0.799
13:01:36   0.813
13:01:37   0.822 
**13:01:38   0**
13:01:39   0.865
13:01:40   0.869

Is there any library that might help in identifying these outlier values[0.235 and 0 in data] in R?

update - output of dput:

structure(list(time = c("13:01:30", "13:01:31", "13:01:32", "13:01:33", 
"13:01:34", "13:01:35", "13:01:36", "13:01:37", "13:01:38", "13:01:39", 
"13:01:40"), v1 = c(0.689, 0.697, 0.701, 0.713, 0.235, 0.799, 
0.813, 0.822, 0, 0.865, 0.869)), .Names = c("time", "v1"), row.names = c(NA, 
11L), class = c("tbl_df", "tbl", "data.frame"))

@akrun - admittedly, outliers on a set of data is different to localised outliers. For this simplified example, they will give the same results, but something examining `residuals` on an `lm` fit might even be worthwhile, or `diff` comparisons... — thelatemail, Feb 02 '16 at 04:06

Jubbles · Accepted Answer · 2016-02-02T04:43:03.320

This may help (as a template)

# load packages
library(ggplot2)   # 2.0.0
library(ggrepel)   # 0.4
library(dplyr)     # 0.4.3

# make data_frame of OP data
ts_tdf <- data_frame(
    time = paste("13", "01", 30:40, sep = ":"),
    v1 = c(0.689, 0.697, 0.701, 0.713, 0.235, 0.799, 0.813, 0.822, 0.00, 0.865, 0.869)   
)

# calculate measure of central tendency (I like median)
v1_median <- median(ts_tdf$v1)

# create absolute deviation column, identify (n = 10) largest outliers, plot (sorted) values of new column 
ts_tdf %>%
    mutate(abs_med = abs(v1 - v1_median)) %>%
    arrange(-abs_med) %>%
    head(n = 10) %>%
    mutate(char_time = as.character(time)) %>%
    ggplot(data = ., aes(x = 1:nrow(.), y = abs_med, label = char_time)) +
    geom_point() + 
    geom_text_repel()

This doesn't really work for sequential data, e.g. - `101:200` shows a steadily increasing sequence that would not be problematic from a outlier detection viewpoint, but may have "outliers" when you take all the values as a set. — thelatemail, Feb 02 '16 at 04:50

Identifying Outlier in Timeseries data in R

1 Answers1