In the recent TIMSS report that I happened to come across, there's a plot (shown below) that in my opinion is very communicative. I've read that such plots are called Cleveland dot plots, though this one adds confidence intervals as well. I was wondering if it can be reproduced in ggplot2 or matplotlib. All hints are welcome.
(source: timss2015.org)
Asked
Active
Viewed 2,260 times
4

Glorfindel
- 21,988
- 13
- 81
- 109

John Smith
- 81
- 2
-
Can you please include data that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker Dec 02 '16 at 15:08
-
I believe data for plot is [here] (http://timss2015.org/wp-content/uploads/filebase/science/1.-student-achievement/1_1_science-distribution-of-science-achievement-grade-4.xls) – John Smith Dec 02 '16 at 16:21
2 Answers
4
Using the iris
data set:
library(dplyr)
library(ggplot2)
plot_data <- iris %>%
group_by(Species) %>%
summarise_each(funs(mean, sd, n(), q95=quantile(., 0.95), q75=quantile(., 3/4), q25=quantile(., 1/4), q5 = quantile(., 0.05)), Sepal.Length) %>%
mutate(se = sd/sqrt(n),
left95 = mean - 2*se,
right95 = mean + 2*se)
ggplot(plot_data, aes(x = Species, y = mean)) +
geom_crossbar(aes(ymin = q5, ymax = q95), fill = "aquamarine1", color = "aquamarine1", width = 0.2) +
geom_crossbar(aes(ymin = q25, ymax = q75), fill = "aquamarine4", color = "aquamarine4", width = 0.2) +
geom_crossbar(aes(ymin = left95, ymax = right95), fill = "black", color = "black", width = 0.2) +
coord_flip() +
theme_minimal()
This should give you the gist of how to use ggplot2
to accomplish this. The data you provided can be easily used, without the dplyr
summarizing.

Jake Kaupp
- 7,892
- 2
- 26
- 36
3
A Cleveland [edited] dot plot display all the values of a dataset as points ordered on the x-axis simply with the position in dataset (not the averages as in the other answer). Using ggplot2
(and the iris
dataset again as example):
ggplot(iris) + geom_point(aes(y=Sepal.Length,x=seq(1,length(Sepal.Length),1)))
If you have an unique ID for each row, you can use that instead of x=seq(1,length(Sepal.Length),1)
since both Y and X are required aesthetics for geom_point

Simone Bianchi
- 134
- 10
-
Pretty sure Cleveland would recommend sorting them in a meaningful way instead of just by row order, but I don't have a reference in front of me to quote from. – Aaron left Stack Overflow Oct 08 '19 at 05:32
-
I guess @Aaron is right: https://en.wikipedia.org/wiki/Dot_plot_(statistics)#Cleveland_dot_plots – Simone Bianchi Oct 09 '19 at 13:15
-
And for a primary reference, just look at the cover of his [Visualizing Data](https://www.amazon.com/Visualizing-Data-William-S-Cleveland/dp/0963488406) book. – Aaron left Stack Overflow Oct 09 '19 at 14:34