0

I am trying to remove the outliers of the variables in a dataframe, and present them in a clean way. As I can't have variables of different lenghts in a dataframe, and I don't want NAs, I decided it to save the variables as vectors in a list. To remove the outliers I'm using a method I saw here: https://stackoverflow.com/a/4937343/12858614. So here goes a reproducible example:

a<-c(1,4,2,2,4,3,15,2)
b<-c(3,3,6,3,4,2,5,232)
df<-data.frame(a,b)

This is a dataframe with 2 variables, each with some obvious outliers. Removing the outliers works fine:

> df[[1]][!df[[1]] %in% boxplot.stats(df[[1]])$out]
[1] 1 4 2 2 4 3 2
> df[[2]][!df[[2]] %in% boxplot.stats(df[[2]])$out]
[1] 3 3 6 3 4 2 5

And making a list with a for loop with the variables unchanged also works:

> l1<-list()
> for (i in 1:2) {
+   l1[i]<-df[i]
+ }
> l1
[[1]]
[1]  1  4  2  2  4  3 15  2

[[2]]
[1]   3   3   6   3   4   2   5 232

The problems comes when I combine both methods:

> l2<-list()
> for (i in 1:2) {
+   l2[i]<-df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out]
+ }
Warning messages:
1: In l2[i] <- df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out] :
  number of items to replace is not a multiple of replacement length
2: In l2[i] <- df[[i]][!df[[i]] %in% boxplot.stats(df[[i]])$out] :
  number of items to replace is not a multiple of replacement length
> l2
[[1]]
[1] 1

[[2]]
[1] 3

I get those warnings, and only the first number of each element of the list. How can I solve this?

Antón
  • 112
  • 1
  • 8
  • 1
    I suggest you read [this thread](https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el). In short, replace l2[i] by l2[[i]]. – marcguery Mar 03 '21 at 14:08
  • I can't believe how long I tried to solve such a trivial problem, thanks! – Antón Mar 03 '21 at 14:31

0 Answers0