This is a question both about best practices for visual representation of data and about how to draw plots in R/ggplot2.
I am trying to find a way to graphically represent the story told here:
"We had 2000 test cases, of which 500 had errors. After investigation, we found that 400 of the tests were Big and 1600 were Small; only 25 of the Big tests had errors, so we set them aside, leaving 1600 Small tests, of which 475 had errors. We then found that 400 of the Small tests were Clockwise and 1200 were Counter-Clockwise; only 20 of the Small Clockwise tests had errors, so we set them aside, leaving 1200 Small Counter-Clockwise tests, of which 455 had errors."
In other words, I am using categories to separate my test cases, and I want to represent how the fraction of errors in each category changes with my progress.
Here's some R with the data:
tests <- data.frame(n.all=c(2000,400,1600,400,1200),n.err=c(500,25,475,20,455),sep.1=as.factor(c("all","Big","Small","Small","Small")),sep.2=as.factor(c("all","all","all","Clockwise","Counter-Clockwise")))
With this small amount of data, a simple numeric table might be the best choice; let's assume that the story continues, with more and more separating categories being used, so that simply listing the numbers isn't the best choice.
What would be a good way to represent this data? I can think of a few possibilities:
- Pie charts, showing slices of the pie being taken away, and the breakdown of errors/no errors in what remains
- Bar charts, similar
- Bar charts with ribbons showing the "flow" of separating away categories, like Minard's chart of Napoleon's march
- Similar, but with the bar charts showing the fractions horizontally rather than vertically
All four methods show the absolute amount of test cases decreasing, and the fraction of errors in the separated category as well as what remains. I think I like #4 best, but I've got an open mind.
How should this kind of data be represented, and can R/ggplot2 be used to do so?