-3

I want to use boxplot.stats on variable by a country split.

Something like boxplot.stats(data$Variable[Country==French])$out does not work. Is there a way to handle it in query or I need to slice my dataframe first to countries?

So far what I am doing is:

FRA <- subset(data, Country=="French")
boxplot.stats(FRA$Variable1)$out

for each country. It gets problematic if there are more than few countries so solving it by one line would be awesome

@duckmayr, @Dario - data is presented that way, it goes all the way to hundreds of thousands. I have multiple countries, few scale variables and want to point out outlier values for each of them

enter image description here

rainbowthug
  • 67
  • 1
  • 8
  • 1
    $out gives outliers from the boxplot.stats function I want to get outliers for each country this way – rainbowthug Feb 09 '21 at 12:38
  • 1
    If you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) you could make it easier for others to find and test an answer to your question. That way you can help others to help you! – dario Feb 09 '21 at 12:44

1 Answers1

1

Since you don't provide example data, I'll demonstrate with the mtcars dataset. We'll look at a boxplot of mpg, grouping by number of cylinders:

res <- boxplot(mpg ~ cyl, data = mtcars)

enter image description here

We've saved the results to the object res so we can inspect the outliers. How can we do that? Consider the following from help("boxplot"):

Value

List with the following components:

[some text omitted]

out      the values of any data points which lie beyond the extremes of the whiskers.
group      a vector of the same length as out whose elements indicate to which group the outlier belongs.
names      a vector of names for the groups.

The out element of res contains the outliers, while the group element says which group each of the outliers came from, and the names element gives the names of the groups. So, we can see each outlier and which group it came from via the following:

cbind(outlier_value = res$out, outlier_group = res$names[res$group])
#      outlier_value outlier_group
# [1,] "10.4"        "8"          
# [2,] "10.4"        "8"

In other words, res$out gives the outliers, and if we subset res$names by the group indices in res$group, we get the group names corresponding to each element of res$out.

duckmayr
  • 16,303
  • 3
  • 35
  • 53
  • I've came up with the sample dataset, however your code seem to be working brilliantly! Could you please explain how you came up with that? It works brilliant but where I can get more in-depth about that, how does it work underneath as the code is not so clear for me what R does – rainbowthug Feb 09 '21 at 12:55
  • @rainbowthug I've added some additional explanation. Don't forget to [click on the check mark if this solves your issue](https://stackoverflow.com/help/someone-answers)! – duckmayr Feb 09 '21 at 13:04