Are your outliers corrupting your data variables so much that you have to mess with your data to interpret your distributions? Why not leave your data where it is and look at the documentation of the boxplot function to just show you only what you want to see which is everything except the outliers which are the dots on R's boxplot function? I could see outliers corrupting the mean. But the black line which a boxplot shows is the median, and it should not be so easily corrupted by outliers.
you can see a few outliers here:
boxplot(airquality$Ozone ~ airquality$Month)
I wonder how I make a boxplot without outliers? How about I look at the documentation?
?boxplot
boxplot(airquality$Ozone ~ airquality$Month, outline = FALSE)
What do you know? The outliers aren't there anymore. By default outliers show when outline is true. So you change it to false and they don't show.
If you want to do the same for your data just ...
boxplot(xf$V1, id.method="y", outline = FALSE)
If I want to remove some outliers from a column of this airquality dataframe.
View(airquality)
Then I can remove the outliers from the Ozone column like so ...
ozone <- boxplot(airquality$Ozone, outline = FALSE, plot = FALSE)
Let's see what we can take from here variable-wise. The outlier points of the ozone column from the airquality dataframe are in $out, so we just do this.
To show the outliers in ozone just do this.
intersect(airquality$Ozone, ozone$out)
To show everything else in ozone just do this.
setdiff(airquality$Ozone, ozone$out)
I can pass this right to the boxplot function without specifying outline = FALSE, and I get the boxplot without the two outlier points.
boxplot(setdiff(airquality$Ozone, ozone$out))
If you want to readjust all your data, I would try tampering it. In my case I'm tampering with a dataframe which is called airquality.
tamper <- apply(airquality, 2, FUN = boxplot)
See all the things you can tamper with.
tamper$
tamper$Ozone
tamper$Ozone$out
It might take the for loop to tamper all the outliers (out) out.
But I have them all in one variable.
Now you can see the outliers in all the 6 columns of airquality. As you can see there are only two columns 1 (Ozone) and 3 (Wind) with outliers, and it shows them.
for(i in 1:length(tamper)){print(tamper[[i]]$out)}
[1] 135 168
numeric(0)
[1] 20.1 18.4 20.7
numeric(0)
numeric(0)
numeric(0)