3

I'm quite a beginner with R and R packages in general. I'd like to ask you if there is any clear solution to the problem below. I've imported my data in .csv format as you can see in the following picture

https://dl.dropboxusercontent.com/u/23801982/1234.jpg

These are grouped data by entity year month and are about the 4 parameters as you can see in the next columns. If also produce a box plot for the e.g. Absrtactions column as following:

https://dl.dropboxusercontent.com/u/23801982/1234566.jpg

Now I'm trying to identify the outliers which I did with boxplot.stats command.

But I don't know how to eliminate exclude the outliers from the results and export them in a new file (e.g. .txt or .csv) due to grouped data. I saw also a manual external way to calculate with IQR but I think it doesn't fit to the exportable dataset required.

The code I used so far is:

rm(list = ls())
library("gdata")

s1 <- read.csv("C:\\Users\\G\\Documents\\R\\Projects\\20141125.csv", header = T)

boxplot(s1$Abstractions ~ s1$Entity, col="green", srt=45) 

boxplot.stats(s1$Abstractions)

Thank you

biobirdman
  • 4,060
  • 1
  • 17
  • 15
GeoBar
  • 37
  • 7
  • 1
    Welcome to StackOverflow! Please read the info about how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). This will make it much easier for others to help you. – Jaap Nov 25 '14 at 14:05
  • possible duplicate of [Removing outliers easily in R](http://stackoverflow.com/questions/15160485/removing-outliers-easily-in-r) – Jaap Nov 25 '14 at 14:09

1 Answers1

6

You are looking at the right function boxplot.stats

to look at what a function in R you can use

?functionName

so try

?boxplot.stats

and you will see that it return the outliers values in a slot call out

Value:

     List with named components as follows:

   stats: a vector of length 5, containing the extreme of the lower
          whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and
          the extreme of the upper whisker.

       n: the number of non-‘NA’ observations in the sample.

    conf: the lower and upper extremes of the ‘notch’ (‘if(do.conf)’).
          See the details.

     out: the values of any data points which lie beyond the extremes
          of the whiskers (‘if(do.out)’).
     Note that ‘$stats’ and ‘$conf’ are sorted in _in_creasing order,
     unlike S, and that ‘$n’ and ‘$out’ include any ‘+- Inf’ values.

so to remove the outliers you can do something like this

outliersValue<- boxplot.stats(x)$out
x[!x %in% outliersValue]

where x is your data .

the %in% operator will check if a value exist in another value. Adding ! is a negation operator , which this case, will reverse the logic, returning True for x that are not found in outliersValue

I hope you find this useful. Happy R-ing

biobirdman
  • 4,060
  • 1
  • 17
  • 15