2

I would like to visualize a data frame much like the following in a plot:

grade number
  A     2
  B     6
  C     1
  D     0
  E     1

The idea is to have the grades on the x-axis as categories and the number of pupils who received the respective grade on the y-axis.

My task is to display them not as points like in a line chart, but as thickness above the category like in a violin plot. This is really about the pure visuals of it.

I tried ggplot2's violin, but It always takes the values of the number column for the y-axis. But the y-axis is supposed to have just one single dimension: the level around which the density-plot is rotated.

I'd be very happy If someone had a hint at how I should maybe restructure my data or maybe if I am completely mistaken with my approach.

Ah, yes: on top I'd like to display the grade-point-average as a small bar.

Thank you very much in advance for taking your time. I'm sure the solution is very obvious, but I just don't see it.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • 1
    Welcome to SO, you may like to review some of the advice in [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to write a good question. As it stands your question is in danger of being closed. You would do well to edit it to provide an example of the code you tried, and what your desired result looks like (more carefully proscribed). Also, make the question less conversational and more to the point. Finally, it is not GNU R. Just R will do as there are various flavours – dww Oct 10 '16 at 22:02
  • You should provide a better [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). There's no way to show any density like you would in a violin plot with just a single number for each grade. That doesn't make any sense. – MrFlick Oct 10 '16 at 22:03
  • With 5 categories a smoothed line seems like overkill, why not just use a barplot or histogram? – Gregor Thomas Oct 10 '16 at 22:42
  • I also sensed that from an analytical point of view this might seem excessive. The dataset is far too simple (But this is how my use case looks like, sorry.) What I really had in mind was a better (let's say fancier) way to illustrate this simple dataset. Thanks a lot for your time. – schultzandschultz Oct 21 '16 at 13:50

1 Answers1

2

As @Gregor mentioned, a smoothed density estimate (which is what a violin plot is) with just five ordinal values isn't really appropriate here. Even if you had plus/minus grades, you'd still probably be better off with bars or lines. See below for a few options:

library(ggplot2)

# Fake data
dat = data.frame(grades=LETTERS[c(1:4,6)],
                 count=c(5,12,11,5,3), stringsAsFactors=FALSE)

# Reusable plot elements
thm = list(theme_bw(),
           scale_y_continuous(limits=c(0,max(dat$count)), breaks=seq(0,20,2)),
           labs(x="Grade", y="Count"))

ggplot(dat, aes(grades, count)) +
  geom_bar(stat="identity", fill=hcl(240,100,50)) +
  geom_text(aes(y=0.5*count, label=paste0(count, " (", sprintf("%1.1f", count/sum(count)*100),"%)")),
            colour="white", size=3) +
  thm

ggplot(dat, aes(grades, count)) +
  geom_line(aes(group=1),alpha=0.4) +
  geom_point() +
  thm

ggplot(dat, aes(x=as.numeric(factor(grades)))) +
  geom_ribbon(aes(ymin=0, ymax=count), fill="grey80") +
  geom_text(aes(y=count, label=paste0(sprintf("%1.1f", count/sum(count)*100),"%")), size=3) +
  scale_x_continuous(labels=LETTERS[c(1:4,6)]) +
  thm

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thank you very much for the detailed answer. First of all, this is really helpful. As you might have noticed, I'm a beginner level autodidact, so every hint of a solution to wor with is a big relief. – schultzandschultz Oct 21 '16 at 13:33
  • But (in addition) since this is about the visuals: Shouldn't it be possible to (abstractly spoken - no idea how this would work codewise) take the ggplot_ribbon-graph and mirror it around one single axis. Like if you took the grades not as distance from the x-axis in the positive y-direction, but as distance from a fixed line (let's say the x-axis) in each (positive AND negative) direction. Category A would contain a value for 2.5 and a mirrored value -2.5. Same for category B and so on. My instinct said this is how a violin plot works, but maybe it is simpler to customize it. – schultzandschultz Oct 21 '16 at 13:43