I am learning R and I have a dataset where I have the variables "annoyTruck_transf", "annoyCar_transf" and "dBA" and three other variables which are not relevant to this question. Said variables are interval scaled. I put them in a boxplot where the y-axis displays the annoyance level of Car or Truck sounds and the x-axis displays the volume in dBA. You can see the boxplot in the picture. As you can see there are some outliers for both boxplots. I have already done the analysis with the outliers but now I wanna try doing the analysis without the outliers. How do I remove them from the dataset? I have already googled the problem but I do not understand the solutions, especially when the people are using different names. I am glad about any help.
EDIT: This was one of my codes to remove the outliers. I am not sure if it is correct since most outliers were removed but four new outliers appeared for Cars, at least for trucks there are no outliers anymore. However, I do not know how many outliers were removed now and how many values are left. How can I check it?
list_quantiles <- tapply(d2_nocars$annoyCar_transf, d2_nocars$dBA,
quantile)
Q1s <- sapply(1:17, function(i) list_quantiles[[i]][2])
Q3s <- sapply(1:17, function(i) list_quantiles[[i]][4])
IQRs <- tapply(d2_nocars$annoyCar_transf, d2_nocars$dBA, IQR)
Lowers <- Q1s - 1.5*IQRs
Uppers <- Q3s + 1.5*IQRs
datas <- split(d2_nocars, d2_nocars$dBA)
data_no_outlier <- NULL
for (i in 1:17){
out <- subset(datas[[i]], datas[[i]]$annoyCar_transf > Lowers[i]
& datas[[i]]$annoyCar_transf < Uppers[I])
data_no_outlier <- rbind(data_no_outlier, out)
}
#Now we exclude the outliers from annoy_Truck_transf
list_quantiles2 <- tapply(data_no_outlier$annoyTruck_transf,
data_no_outlier$dBA, quantile)
Q1s2 <- sapply(1:17, function(i) list_quantiles2[[i]][2])
Q3s2 <- sapply(1:17, function(i) list_quantiles2[[i]][4])
IQRs2 <- tapply(data_no_outlier$annoyTruck_transf,
data_no_outlier$dBA, IQR)
Lowers2 <- Q1s2 - 1.5*IQRs2
Uppers2 <- Q3s2 + 1.5*IQRs2
datas2 <- split(d2_nocars, d2_nocars$dBA)
data_no_outlier2 <- NULL
for (i in 1:17){
out2 <- subset(datas2[[i]], datas2[[i]]$annoyTruck_transf >
Lowers2[i]
& datas2[[i]]$annoyTruck_transf < Uppers2[I])
data_no_outlier2 <- rbind(data_no_outlier2, out2)
}
Boxplots: enter image description here