6

This is a question both about best practices for visual representation of data and about how to draw plots in R/ggplot2.

I am trying to find a way to graphically represent the story told here:

"We had 2000 test cases, of which 500 had errors. After investigation, we found that 400 of the tests were Big and 1600 were Small; only 25 of the Big tests had errors, so we set them aside, leaving 1600 Small tests, of which 475 had errors. We then found that 400 of the Small tests were Clockwise and 1200 were Counter-Clockwise; only 20 of the Small Clockwise tests had errors, so we set them aside, leaving 1200 Small Counter-Clockwise tests, of which 455 had errors."

In other words, I am using categories to separate my test cases, and I want to represent how the fraction of errors in each category changes with my progress.

Here's some R with the data:

tests <- data.frame(n.all=c(2000,400,1600,400,1200),n.err=c(500,25,475,20,455),sep.1=as.factor(c("all","Big","Small","Small","Small")),sep.2=as.factor(c("all","all","all","Clockwise","Counter-Clockwise")))

With this small amount of data, a simple numeric table might be the best choice; let's assume that the story continues, with more and more separating categories being used, so that simply listing the numbers isn't the best choice.

What would be a good way to represent this data? I can think of a few possibilities:

Four possible plots: pie, bar, bar with path, horizontal bar with path

  1. Pie charts, showing slices of the pie being taken away, and the breakdown of errors/no errors in what remains
  2. Bar charts, similar
  3. Bar charts with ribbons showing the "flow" of separating away categories, like Minard's chart of Napoleon's march
  4. Similar, but with the bar charts showing the fractions horizontally rather than vertically

All four methods show the absolute amount of test cases decreasing, and the fraction of errors in the separated category as well as what remains. I think I like #4 best, but I've got an open mind.

How should this kind of data be represented, and can R/ggplot2 be used to do so?

andrewtinka
  • 593
  • 4
  • 10
  • Look at package 'vcd'. But that said I think this is not a good question for SO. – IRTFM Jun 10 '13 at 19:42
  • 1
    I think this is an interesting question, but I agree that it might be a bit too open ended for StackOverflow. CrossValidated also sometimes welcomes data visualization questions that are more conceptual than programming related. I would perhaps ask there in chat or something before asking though, just to be sure. – joran Jun 10 '13 at 19:57
  • Yep, once you know what visualisation you want, bring it back if you're having difficulty with the code – alexwhan Jun 10 '13 at 23:19
  • +1 for cute hand-drawn pictures. But agree with others - too vague as it stands with no opportunity to help with code. – SlowLearner Jun 11 '13 at 05:51

1 Answers1

0

Remember the 3 things that should be in line when drawing graphs; the message you are telling, the message the data is telling you and the message the graph is telling you. In my opinion your option 4 is the best one to get the message across consistently.

I also arrive at number 4 by sheer elimination: ;)

Columns are not suitable since you are combining vertical representation with a horizontal flow, comparing pie charts are also not easy to do (even within a pie chart it is already difficult to compare the different parts) so they are not an option either. Leaving you with option 4 indeed :)

You can also try a Sankey Diagram. Sankey Diagrams in R? might be helpful

Community
  • 1
  • 1
Chrisvdberge
  • 1,824
  • 6
  • 24
  • 46