I did a box plot comparing the ages of male swimming Olympic athletes and then whether or not they earned a medal. I'm wondering how to do the code to get a five number summary for the box plot with no medal and the box plot with medal (I changed medal to a factor). I tried summary(age,medal.f)
and summary(age~medal.f)
and nothing seems to be working/I don't know how to separate the box plots. Any thoughts on how to do this?
Asked
Active
Viewed 1,322 times
0

Ben Bolker
- 211,554
- 25
- 370
- 453

user2821333
- 41
- 1
- 4
1 Answers
6
The easiest way to get this information is to save the result of your boxplot()
call and extract the $stats
component. Using the built-in ToothGrowth
data set,
b <- boxplot(len~supp,data=ToothGrowth)
b$stats
## [,1] [,2]
## [1,] 8.2 4.2
## [2,] 15.2 11.2
## [3,] 22.7 16.5
## [4,] 25.8 23.3
## [5,] 30.9 33.9
More generally, you can do this by hand with something like
with(data,lapply(split(age,medal),boxplot.stats))
There are many other solutions involving by()
or the plyr
, dplyr
, data.table
packages ...
Again using ToothGrowth
:
(bps <- with(ToothGrowth,lapply(split(len,supp),boxplot.stats)))
$OJ
$OJ$stats
[1] 8.2 15.2 22.7 25.8 30.9
$OJ$n
[1] 30
$OJ$conf
[1] 19.64225 25.75775
$OJ$out
numeric(0)
$VC
$VC$stats
[1] 4.2 11.2 16.5 23.3 33.9
$VC$n
[1] 30
$VC$conf
[1] 13.00955 19.99045
$VC$out
numeric(0)
If you just want the 5-number summaries, you can extract them as follows:
sapply(bps,"[[","stats")
OJ VC
[1,] 8.2 4.2
[2,] 15.2 11.2
[3,] 22.7 16.5
[4,] 25.8 23.3
[5,] 30.9 33.9

Ben Bolker
- 211,554
- 25
- 370
- 453
-
Thanks for the response! split(age,medal.f) listed everything in the two groups and separated like I needed it to but now I'm still confused on how to take the 5 number summary of those splits. I tried using with like you suggested but that didn't get to the result I needed. – user2821333 Dec 08 '15 at 18:25
-
if this doesn't work you definitely need to provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ... – Ben Bolker Dec 08 '15 at 18:28
-
library(coin) age <- c(15,15,16,17,20,21,21,21,25) medal <- c(0,0,1,0,0,1,1,0,0) medal.f <- factor(medal, labels = c("No Medal", "Medal")) wilcox_test(age~medal.f) boxplot(age~medal.f, main="Age vs. Medals", ylab="Age", col=(c("darkolivegreen1","lavender"))) split(age,medal.f) – user2821333 Dec 08 '15 at 18:40
-
You should add this information to your question (although it doesn't match the boxplot you've shown) ... what's wrong with `lapply(split(age,medal.f), boxplot.stats)` ... ??? – Ben Bolker Dec 08 '15 at 18:46
-
I simplified the data because my original vectors had 444 numbers in them but the answer you just edited fixed my problem, thanks so much! – user2821333 Dec 08 '15 at 18:50
-
if this answered your question, you're encouraged to click on the check-mark to accept it ... – Ben Bolker Dec 08 '15 at 19:38