I have following dataframe
d <- data.frame("Open" = rnorm(10,5,1) )
and if I insert outlier value
d$Open[4] = 100
d$Open[5] = 100
Now I want to replace these outlier value with normal values.
I tried replacing it with previous value but it doesn't work if there are outliers one after the other. or the very first element is an outlier.
d$Temp <- lag(d$Open ,1)
d$Temp <- ifelse(is.na(d$Temp), d$Open, d$Temp)
d$Open <- ifelse(
( ( d$Open - d$Temp)/ d$Temp) > 3,
lag(d$Open, 1) ,
d$Open
)
d <- d[,1]
Is there any better way to remove them?
Currently I have defined outlier value as any value which is 300% more than the previous value.
My definition of outlier is wrong that's why I am asking for any new method. I guess more than 100% than the median would be more appropriate for definition of outlier as mentioned in the comment. Thanks :)
My example was simple but if I use it with stock price then @Tarjae's method of identifying outlier doesn't work
dput(round(d$Open,0))
c(5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 6, 6, 5, 6, 7, 6, 6, 7, 6,
6, 5, 6, 7, 8, 8, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5,
5, 6, 6, 6, 5, 5, 6, 6, 7, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6,
5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 6, 6, 6, 6, 6, 7, 7, 8, 7, 7, 7,
7, 7, 9, 8, 8, 9, 8, 8, 10, 11, 10, 10, 10, 9, 12, 10, 9, 10,
9, 11, 11, 11, 11, 10, 10, 10, 8, 8, 9, 8, 8, 8, 8, 7, 7, 8,
8, 8, 8, 9, 8, 6, 8, 8, 7, 7, 7, 8, 8, 9, 10, 10, 11, 11, 10,
10, 11, 11, 11, 11, 13, 13, 14, 13, 13, 15, 16, 16, 16, 18, 17,
17, 17, 17, 18, 16, 15, 14, 17, 16, 17, 16, 16, 17, 18, 23, 19,
18, 18, 17, 17, 16, 17, 18, 17, 16, 16, 16, 16, 15, 16, 16, 18,
22, 24, 24, 30, 31, 31, 31, 32, 40, 41, 39, 38, 36, 34, 33, 33,
38, 38, 36, 34, 34, 36, 33, 35, 34, 33, 33, 34, 33, 33, 34, 33,
38, 37, 38, 37, 40, 42, 40, 43, 45, 46, 40, 39, 41, 33, 30, 32,
29, 32, 30, 30, 29, 32, 31, 34, 35, 34, 34, 33, 32, 31, 32, 33,
31, 30, 31, 36, 36, 34, 35, 36, 33, 32, 33, 32, 35, 35, 40, 40,
42, 38, 39, 39, 36, 37, 38, 36, 38, 39, 40, 40, 40, 41, 45, 49,
50, 50, 49, 48, 50, 49, 50, 57, 58, 54, 54, 54, 49, 49, 58, 54,
50, 52, 70, 75, 77, 78, 82, 98, 102, 121, 129, 126, 125, 124,
118, 138, 147, 126, 107, 97, 100, 89, 103, 93, 93, 74, 72, 68,
82, 82, 81, 95, 98, 94, 96, 95, 94, 90, 85, 91, 76, 72, 72, 78,
81, 83, 85, 84, 80, 80, 79, 78, 65, 63, 46, 55, 41, 36, 32, 40,
33, 24, 21, 19, 21, 30, 27, 28, 30, 25, 24, 21, 24, 25, 24, 22,
22, 20, 23, 25, 27, 30, 34, 34, 37, 35, 41, 49, 58, 55, 51, 54,
52, 51, 50, 41, 44, 51, 54, 48, 52, 54, 54, 57, 77, 70, 69, 68,
68, 74, 70, 62, 74, 73, 74, 73, 72, 76, 73, 79, 82, 88, 91, 90,
82, 81, 82, 86, 88, 94, 91, 95, 97, 103, 102, 94, 102, 93, 87,
79, 79, 80, 77, 80, 83, 86, 81, 82, 80, 84, 85, 83, 83, 84, 85,
92, 88, 90, 88, 91, 93, 90, 91, 87, 84, 89, 81, 75, 76, 70, 68,
71, 74, 68, 68, 64, 54, 58, 59, 58, 52, 52, 52, 54, 61, 59, 57,
64, 63, 64, 60, 58, 54, 55, 55, 56, 51, 54, 56, 52, 51, 49, 41,
39, 38, 37, 37, 35, 38, 36, 34, 32, 32, 31, 33, 32, 32, 28, 25,
25, 24, 22, 23, 23, 23, 26, 31, 31, 32, 32, 37, 36, 37, 36, 36,
34, 32, 32, 33, 31, 30, 30, 27, 24, 25, 25, 25, 24, 25, 30, 30,
30, 31, 29, 27, 27, 27, 27, 24, 24, 22, 24, 25, 26, 28, 30, 30,
29, 28, 30, 31, 31, 33, 34, 34, 34, 33, 35, 32, 30, 30, 31, 29,
27, 27, 26, 26, 23, 21, 24, 25, 26, 25, 24, 27, 25, 25, 24, 24,
24, 23, 22, 24, 24, 25, 24, 23, 22, 22, 20, 21, 21, 22, 22, 22,
21, 23, 24, 24, 25, 26, 26, 28, 26, 27, 27, 27, 30, 30, 33, 36,
33, 31, 30, 32, 31, 33, 30, 30, 29, 34, 35, 37, 40, 37, 37, 37,
37, 41, 42, 41, 44, 41, 39, 38, 43, 38, 40, 39, 42, 44, 43, 39,
41, 40, 38, 34, 24, 24, 23, 25, 27, 28, 27, 32, 28, 27, 29, 26,
27, 26, 29, 28, 28, 29, 31, 28, 28, 31, 29, 28, 25, 26, 23, 23,
24, 23, 20, 21, 20, 21, 20, 18, 17, 18, 21, 19, 19, 19, 21, 19,
19, 19, 17, 16, 15, 14, 16, 15, 14, 15, 17, 17, 18, 16, 15, 15,
14, 13, 14, 12, 12, 12, 12, 11, 9, 8, 9, 9, 8, 8, 7, 10, 11,
12, 11, 11, 12, 14, 14, 15, 14, 14, 12, 13, 12, 12, 14, 15, 17,
16, 16, 16, 17, 17, 15, 15, 13, 14, 13, 13, 13, 13, 15, 14, 15,
17, 16, 14, 12, 15, 16, 15, 16, 15, 15, 18, 21, 20, 19, 19, 18,
19, 18, 19, 18, 18, 18, 18, 21, 20, 22, 20, 20, 20, 19, 18, 18,
19, 20, 18, 19, 19, 20, 21, 23, 22, 20, 20, 19, 20, 22, 23, 25,
25, 28, 26, 27, 26, 25, 25, 24, 23, 22, 23, 21, 23, 24, 26, 30,
28, 27, 20, 23, 21, 24, 22, 19, 19, 18, 20, 25, 24, 25, 23, 24,
23, 23, 24, 22, 32, 29, 28, 27, 27, 24, 25, 28, 28, 29, 28, 28,
30, 34, 33, 32, 29, 26, 29, 27, 27, 41, 43, 42, 40, 41, 34, 36,
37, 36, 36, 33, 34, 32, 32, 30, 29, 33, 34, 36, 34, 37, 41, 40,
36, 33, 33, 32, 32, 33, 33, 32, 31, 30, 27, 30, 30, 29, 26, 31,
26, 26, 23, 23, 27, 27, 29, 26, 27, 27, 26, 28, 29, 31, 33, 31,
31, 29, 27, 28, 28, 27, 27, 28, 27, 26, 25, 26, 24, 25, 24, 20,
17, 14, 16, 15, 15, 15, 14, 15, 14, 14, 14, 15, 16, 15, 21, 20,
19, 19, 18, 18, 18, 22, 26, 26, 28, 27, 28, 24, 25, 24, 22, 22,
22, 21, 24, 26, 27, 26, 26, 27, 27, 34, 37, 34, 32, 30, 32, 34,
30, *2900*, 31, 36, 32, 34, 33, 37, 37, 37, 42, 57, 52, 54, 52,
51)
In the above data there is only one outlier which is 2900