In the formula you used for influential observation selection the condition should be as follows: if an observation has the Cook's distance more than 4 time of Cook's distance mean it can be considered ifluential (potentially an outlier).
Cook's distance or Cook's D is a commonly used estimate of the influence of a data point
when performing a least-squares regression analysis.
In a practical ordinary least
squares analysis, Cook's distance can be used in several ways: to indicate influential data > points that are particularly worth checking for validity; or to indicate regions of the
design space where it would be good to be able to obtain more data points.
In general use, those observations that have a
cook’s distance greater than 4 times the mean may be classified as
influential. This is not a hard boundary.
Please see as an example the influential observation identification for ozone
data set:
ozone <- read.csv("http://rstatistics.net/wp-content/uploads/2015/09/ozone.csv")
m <- lm(ozone_reading ~ ., data=ozone)
cooksdistance <- cooks.distance(m)
influential <- as.numeric(names(cooksdistance)[(cooksdistance > 4 * mean(cooksdistance, na.rm = TRUE))])
ozone[influential, ]
# Month Day_of_month Day_of_week ozone_reading pressure_height Wind_speed Humidity Temperature_Sandburg Temperature_ElMonte
# 19 1 19 1 4.07 5680 5 73 52 56.48
# 23 1 23 5 4.90 5700 5 59 69 51.08
# 58 2 27 5 22.89 5740 3 47 53 58.82
# 133 5 12 3 33.04 5880 3 80 80 73.04
# 135 5 14 5 31.15 5850 4 76 78 71.24
# 149 5 28 5 4.82 5750 3 76 65 51.08
# 243 8 30 1 37.98 5950 5 62 92 82.40
# 273 9 29 3 4.60 5640 5 93 63 54.32
# 286 10 12 2 7.00 5830 8 77 71 67.10
# Inversion_base_height Pressure_gradient Inversion_temperature Visibility
# 19 393 -68 69.80 10
# 23 3044 18 52.88 150
# 58 885 -4 67.10 80
# 133 436 0 86.36 40
# 135 1181 50 79.88 17
# 149 3644 86 59.36 70
# 243 557 0 90.68 70
# 273 5000 30 52.70 70
# 286 337 -17 81.14 20
Interpretation:
Row 58, 133, 135 have very high ozone_reading.
Rows 23, 135 and 149 have very high Inversion_base_height.
Row 19 has very low Pressure_gradient.