I am trying to remove the outliers of the variables in a dataframe, and present them in a clean way. As I can't have variables of different lenghts in a dataframe, and I don't want NAs, I decided it to save the variables as vectors in a list. To remove the outliers I'm using a method I saw here: https://stackoverflow.com/a/4937343/12858614. So here goes a reproducible example:
a<-c(1,4,2,2,4,3,15,2)
b<-c(3,3,6,3,4,2,5,232)
df<-data.frame(a,b)
This is a dataframe with 2 variables, each with some obvious outliers. Removing the outliers works fine:
> df[[1]][!df[[1]] %in% boxplot.stats(df[[1]])$out]
[1] 1 4 2 2 4 3 2
> df[[2]][!df[[2]] %in% boxplot.stats(df[[2]])$out]
[1] 3 3 6 3 4 2 5
And making a list with a for loop with the variables unchanged also works:
> l1<-list()
> for (i in 1:2) {
+ l1[i]<-df[i]
+ }
> l1
[[1]]
[1] 1 4 2 2 4 3 15 2
[[2]]
[1] 3 3 6 3 4 2 5 232
The problems comes when I combine both methods:
> l2<-list()
> for (i in 1:2) {
+ l2[i]<-df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out]
+ }
Warning messages:
1: In l2[i] <- df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out] :
number of items to replace is not a multiple of replacement length
2: In l2[i] <- df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out] :
number of items to replace is not a multiple of replacement length
> l2
[[1]]
[1] 1
[[2]]
[1] 3
I get those warnings, and only the first number of each element of the list. How can I solve this?