2

The Bryer Likert package has many useful features for plotting diverging bar charts of Likert-type data. However, one basic feature is missing -- there does not appear to be any way to show the total number of sample points for each question/group when printing out a bar chart. If one wants to include the histogram chart, then these n-values will appear in the histogram. But often I find the histogram makes the entire plot too busy.

For example, using the pisa dataset, I can plot a diverging bar chart for results grouped by country below.

 data(pisaitems)

 items28 <- pisaitems[, substr(names(pisaitems), 1, 5) == "ST24Q"]

 # Create the likert object using country as a grouping variable.
 l28g <- likert(items28, grouping = pisaitems$CNT)

 # Optional - print a summary.
 print(l28g)

 # Plot the bar chart.
 plot(l28)

The resulting plot should look like this: diverging bar chart

But unless I also include a histogram somehow (which I don't want to do), there is no option to report the number of data points underlying each group/question. Currently I have no way of knowing (just by looking at the bar chart) whether the results are based on 5,000 responses or 10 responses. This information is easily accessed from the underlying data in many ways, for example, the following code yields the number of data points by each country for question ST24Q01:

 margin.table(table(pisaitems$CNT, items28$ST24Q01), 1)

Ideally, I could create the plot of the data and somewhere on the graph (perhaps off the right hand side, like the HH package does?) report the n-value for each bar on the chart (i.e., each question/country).

I've fooled around with the likert function but have been so far unable to figure out how to include the n-values in the output, and then translate those to the final plot/chart.

Any insights much appreciated!

MH765
  • 390
  • 3
  • 11

1 Answers1

3

In this case the counts don't vary by question, so you only need one table for number of responses. Below are ways to put number of responses next to each question, for cases where the number of responses varies, or as a single table.

Add Number of Responses by Question

One way to do this would be to modify the underlying code for likert.bar.plot to include the ability to add counts of responses. Here I've just hacked the output of likert.bar.plot to add the response counts after the fact.

library(dplyr)
library(gridExtra)
library(reshape2)

First, get response counts by Item for each CNT. The variable=NA at the end is there because the original data frame that likert.bar.plot generates in creating the plot creates and uses a column called variable. Even though we don't use that column in our subsequent call to geom_text with the new data frame below, ggplot still expects that colunmn to be present in the new data frame.

counts = pisaitems %>%
  select(CNT, matches("ST24Q")) %>% 
  melt(id.var="CNT", variable.name="Item") %>%
  count(CNT, Item) %>%
  mutate(variable=NA)

We use geom_text to add response counts by item, but we need to make a few other changes to the output of plot(l28g), as follows:

  1. Expand the y-axis limits using scale_y_continuous out to 150 so that the text values (which I've put at 145) will be visible. This overrides the y-scale in the original plot created by plot(l28g) (which calls likert.bar.plot to actually produce the plot).

  2. Set the visible y-axis range to stop at 110. We do this inside coord_flip(), which overrides the original coord_flip() from likert.bar.plot. We do this so that the text for the number of responses will be just to the right of the plot area, rather than inside it.

  3. Increase the right plot margin, so that there will be some space to the right of the plot.

  4. Turn off clipping, so that text printed outside the plot area will be visible.

Here's the plot code. It might take several seconds to render, so be patient.

p = plot(l28g) + 
  geom_text(data=counts,
            aes(label=format(n,big.mark=","), x=CNT, y=145), 
            size=2.5, colour="grey30", hjust=1) +
  scale_y_continuous(limits=c(-100,150)) +
  coord_flip(ylim=c(-110,110)) +
  theme(plot.margin=unit(c(0.2,2,0.2,0.2),"cm"))

# Turn off clipping
# http://stackoverflow.com/a/9691256/496488
p <- ggplot_gtable(ggplot_build(p))
p$layout$clip <- "off"
grid.draw(p)

enter image description here

Add Number of Responses in A Single Table

One option would be to create a table grob (grob = graphical object) and lay it out along side or below the main plot. For example:

library(dplyr)
library(gridExtra)
library(reshape2)

tt <- ttheme_default(
  core=list(fg_params=list(fontsize=9)),
  colhead=list(fg_params=list(fontsize=9)),
  rowhead=list(fg_params=list(fontsize=9)))

grid.arrange(plot(l28g),
             arrangeGrob(nullGrob(),
                         textGrob("Number of Responses", 
                                  gp=gpar(fontsize=11,fontface="bold")),
                         tableGrob(pisaitems %>% 
                                     rename(Country=CNT) %>% 
                                     count(Country) %>%
                                     mutate(n=format(n, big.mark=",")), 
                                   theme=tt, rows=NULL),
                         nullGrob(),
                         heights=c(15,1,5,15)),
             widths=c(3,1))

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • That's a good point, though I'm guessing it would be difficult to create a grob table for every single question in the bar chart (given that the counts may be slightly different for each question) and make sure that they align perfectly. The other thought I had was to modify the group name on the left hand side to include the number of sample points in parentheses. For example, instead of "United States" it might be "United States (n=5,233)" for question ST24Q01. But this value might be different for the other questions. Obviously this would be simpler if results are not reported by group. – MH765 Aug 20 '16 at 21:07
  • See updated answer and let me know if that does what you need. – eipi10 Aug 20 '16 at 21:35