4

In the recent TIMSS report that I happened to come across, there's a plot (shown below) that in my opinion is very communicative. I've read that such plots are called Cleveland dot plots, though this one adds confidence intervals as well. I was wondering if it can be reproduced in ggplot2 or matplotlib. All hints are welcome. plot
(source: timss2015.org)

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
John Smith
  • 81
  • 2
  • Can you please include data that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker Dec 02 '16 at 15:08
  • I believe data for plot is [here] (http://timss2015.org/wp-content/uploads/filebase/science/1.-student-achievement/1_1_science-distribution-of-science-achievement-grade-4.xls) – John Smith Dec 02 '16 at 16:21

2 Answers2

4

Using the iris data set:

library(dplyr)
library(ggplot2)

plot_data <- iris %>% 
  group_by(Species) %>% 
  summarise_each(funs(mean, sd, n(), q95=quantile(., 0.95), q75=quantile(., 3/4), q25=quantile(., 1/4),  q5 = quantile(., 0.05)), Sepal.Length) %>% 
  mutate(se = sd/sqrt(n),
         left95 = mean - 2*se,
         right95 = mean + 2*se)


ggplot(plot_data, aes(x = Species, y = mean)) +
  geom_crossbar(aes(ymin = q5, ymax = q95), fill = "aquamarine1",  color = "aquamarine1", width = 0.2) +
  geom_crossbar(aes(ymin = q25, ymax = q75), fill = "aquamarine4",  color = "aquamarine4", width = 0.2) +
  geom_crossbar(aes(ymin = left95, ymax = right95), fill = "black", color = "black", width = 0.2) +
  coord_flip() +
  theme_minimal()

enter image description here

This should give you the gist of how to use ggplot2 to accomplish this. The data you provided can be easily used, without the dplyr summarizing.

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
3

A Cleveland [edited] dot plot display all the values of a dataset as points ordered on the x-axis simply with the position in dataset (not the averages as in the other answer). Using ggplot2 (and the iris dataset again as example):

ggplot(iris) + geom_point(aes(y=Sepal.Length,x=seq(1,length(Sepal.Length),1))) 

If you have an unique ID for each row, you can use that instead of x=seq(1,length(Sepal.Length),1) since both Y and X are required aesthetics for geom_point

Cleveland dot plot