In a data frame, I am trying to look for data points that are more than (threshold * s.d.) away from mean. The dim
of the data frame is as follows:
[1] 4032 4
To find the data points for the above condition, I did:
df$mean = rollapply(df$value, width = 2, FUN = mean, align = "right", fill = "extend")
df$sd = rollapply(df$value, width = 2, FUN = sd, align = "right", fill = "extend")
After the above the head(df)
looks like:
timestamp value mean sd
2007-03-14 1393577520 37.718 38.088 0.5232590
2007-03-15 1393577220 38.458 38.088 0.5232590
2007-03-16 1393576920 37.912 38.185 0.3860803
2007-03-17 1393576620 40.352 39.132 1.7253405
2007-03-18 1393576320 38.474 39.413 1.3279465
2007-03-19 1393576020 39.878 39.176 0.9927779
To find the datapoints:
anomaly = df[df$value > abs((threshold*df$sd + df$mean) |
(df$mean - threshold*df$sd)),]
Is above the correct way to find data points that are more than (threshold * s.d.) away from mean. The reason I am suspicious is that dim
of anomaly
is same as that of df
.