I've been sent the CSV output from a Google Forms questionnaire and asked to create summary plots in R. However, I've reached a major stumbling block when it comes to analysing the results from a checkbox grid as in the linked image: https://i.stack.imgur.com/0QXZb.png
Each participant had been asked to specify the age of all of their children. Child number was shown along the top of the grid (e.g. 'Child 1', 'Child 2' etc.) and age bracket in a column down the left-hand side (e.g. 10-13, 14-18, etc.). Multiple responses could be selected from the grid and it's this that's giving me a headache.
When it comes to the CSV output, results for the problem question have been separated so that they occur across multiple columns. Age brackets are displayed as separate columns and multiple responses can occur within a cell (see a very small example below). The real dataset contains several hundred participants and the results have been subsetted according to multiple criteria.
x10a.6.9 x10a.10.13 x10a.14.18 x10a.19.23
child 2;child 3 child 1
child 1
child 3;child 4 child 2 child 1
child 1; child 2
Edit: Reproducible version of the ugly table (thanks for the link Mojoesque):
structure(list(x10a.6.9 = c("child 2;child 3", NA, "child 3;child 4",
NA), x10a.10.13 = c(NA, "child 1", "child 2", "child 1;child 2"
), x10a.14.18 = c("child 1", NA, NA, NA), x10a.19.23 = c(NA,
NA, "child 1", NA)), row.names = c(NA, -4L), class = "data.frame")
I would like to know how I would go about reshaping this data so that it can be presented in simple bar charts. I don't know how to arrange this data in order to get it to cooperate with ggplot2. I would, if at all possible, like it to look like the summary image produced in Google Forms. As it stands, I have no idea how to use data like this to plot age along the x-axis and count along the y-axis. I want to show this separately for each child number as in the attached image, but do not know where to begin.
Any and all help would be greatly appreciated. I apologise profusely if the question is worded poorly and apologise too for my incredible naivity.
Edit: Resolution
I played around with this a little and figured out how to plot what I wanted. I'll post the steps used below in case they are useful to anyone else.
In the code below, data_to_split corresponds to the small snippet of a table that was shown above.
library(tidyr)
library(dplyr)
library(ggplot2)
data_split <- data_to_split %>% separate_rows(x10a.6.9:x10a.19.23, sep = ";") %>%
select(x10a.6.9:x10a.19.23) %>% na_if(., "") %>%
mutate_all(funs(as.factor))
It was necessary to use the separate_rows function from tidyr as multiple responses could occur within a cell (e.g. a cell could read 'child 1;child 2') The separator ";" was used to split these cells into multiple rows.
na_if was used to convert empty spaces that were reading as factors into NA values.
In the code below, new columns were added. These columns simply provided new names for the columns that had been read into R. The old ones were ugly and harder to work with.
data_split$`6-9` <- data_split$x10a.6.9
data_split$`10-13` <- data_split$x10a.10.13
data_split$`14-18` <- data_split$x10a.14.18
data_split$`19-23` <- data_split$x10a.19.23
In the first two lines of code below, all relevant columns containing age brackets were selected. The data was then converted from wide to long format. In the third line, na values were dropped as I did not wish to display them. Tally() was used to obtain an n value that could be used to display count along the y-axis. In the fourth line, it was necessary to re-order the age-brackets, so that they were shown along the x-axis in chronological order. The rest of the lines helped develop a basic barplot.
age_plot <- data_split %>% select(`6-9`:`19-23`) %>%
pivot_longer(., cols = c(`6-9`:`19-23`), names_to = "Var", values_to = "Val") %>%
drop_na() %>% group_by(Var, Val) %>% tally() %>%
ggplot(aes(x = factor(Var, level = c("6-9", "10-13", "14-18", "19-23")), y = n,
fill = factor(Val, level = c("child 1", "child 2", "child 3", "child 4")))) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
theme_classic() + labs(fill = "Child", x = "Age", y = "Number")
age_plot
The result looked like this. (Obviously this graph looks a little odd with so few data points, but the real things looked good!)