0

I am a beginner in R. I got stumped writing a rmarkdown report,using ggplot, on a survey. Sample size is at present small. I got thinking - how can I best visualize answers, given that three answers are possible "Yes", "No", "Uncertain" and I want the readers to know at a glance that all three answers were possible, but some of options were not chosen. Below reproduces my current data for that question:

df.YesNoUncertain <- data.frame(
  X = sample(c("Yes", "No"), 11, replace = TRUE, prob = c(.99,.001)),
  Y = sample(c("Yes", "No"), 11, replace = TRUE, prob = c(.9,.2)),
  stringsAsFactors = F
)

I thought of maybe pie charts, but then the answers which were not chosen ("Uncertain") were not shown. Maybe there are better ways to do this?

r0berts
  • 842
  • 1
  • 13
  • 27
  • Not really a question for SO, but get inspired here: [https://www.r-graph-gallery.com/](https://www.r-graph-gallery.com/). – jay.sf Jun 22 '18 at 07:22

1 Answers1

1

I dislike piecharts (for different reasons, see e.g. this post), so how about something like this?

df.YesNoUncertain %>%
    gather(Group, Response) %>%
    mutate(Response = factor(Response, levels = c("Yes", "No", "Uncertain"))) %>%
    count(Group, Response) %>%
    complete(Group, Response, fill = list(n = 0)) %>%
    ggplot(aes(Response, n, fill = Group)) + geom_col(position = "dodge")

enter image description here

Readers can easily identify zero count responses, e.g. there are zero "No" responses in Group "X", and there are zero "Uncertain" responses in both groups.


Update

To show percentages you can do the following

df.YesNoUncertain %>%
    gather(Group, Response) %>%
    mutate(Response = factor(Response, levels = c("Yes", "No", "Uncertain"))) %>%
    count(Group, Response) %>%
    complete(Group, Response, fill = list(n = 0)) %>%
    group_by(Group) %>%
    mutate(Percentage = n / sum(n) * 100) %>%
    ggplot(aes(Response, Percentage, fill = Group)) + geom_col(position = "dodge")

enter image description here

Alternatively, you can also use scales::percent, see e.g. ggplot replace count with percentage in geom_bar.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • This is really good and I agree about piecharts; I think I should have been more specific though. X and Y are answers to different questions so it would make it difficult to read if answers to the same question could be found in different groupings (intuitively the first group of bars read like answer to the first question and it takes some though to understand that the question is represented in two places) – r0berts Jun 22 '18 at 09:38
  • @r0berts *"it takes some though to understand that the question is represented in two places"* Sorry but I have no idea what that means; questions (groups in my plot) are shown *next to each other*; responses are shown *at different "places"* along the x-axis. Stacked/dodged barcharts are what they are: they are probably most suitable for visualising categorical data, and I would say they are straightforward to read; but I agree that depends on your target audience. Perhaps take a look at the link jaySf gives for some inspiration. – Maurits Evers Jun 22 '18 at 10:27
  • You are quite right of course. In addition to display grouped by question all it takes is to swap places of Group and Response in the ggplot line. I am currently looking also if I could display the barchart as percentage. – r0berts Jun 22 '18 at 10:42
  • 1
    @r0berts I've added an example how to show percentages (instead of counts); it might be useful. Good luck with your work! – Maurits Evers Jun 22 '18 at 12:04
  • I came to a bit similar solution, but because I was new to it, I spent a long time testing it step by step. Is that right %>% passes the resultant dataframe into the next function? So in step 5 the fields from step 1 may not be available? – r0berts Jun 22 '18 at 13:58
  • 1
    @r0berts Yes, the `magrittr` pipe operator `%>%` takes the left-hand side value and pipes it into the expression on the right-hand side. So `x %>% f()` is the same as `f(x)`. It works for any R object (not only `data.frame`s) as long as the object is valid as the first argument of the expression on the right-hand side. – Maurits Evers Jun 22 '18 at 14:04
  • \@ Maurits Evers. With percentages there is a weird thing. I am using `tikz` device instead of `pdf` because the plot fonts seem better, but percent signs in plot give error. Have you ever come across this problem? I can still use `pdf`, but out of curiosity - is it possible to use `tikz` and still have percent scale? – r0berts Jun 22 '18 at 14:58
  • @r0berts Sorry, I've never used `tikzDevice` before. Perhaps try using the unicode character `\u0025` for percentage instead of the sign? – Maurits Evers Jun 22 '18 at 15:21
  • Thanks, it was hard to find, but I got it. If you use `tikzDevice` you can set `sanitize=TRUE` in chunk options, that solves it. Tikz gives better scaled graphs. Tikz https://tex.stackexchange.com/questions/209837/using-in-a-tikz-plot-generated-by-knitr – r0berts Jun 22 '18 at 15:41