A key part of ggplot2
is that data must be tidy for it to work properly. This can be a bit of a hassle sometimes, but it usually pays off.
This is my full solution, working hard on getting the data in tidy format and then the ggplot2
is a lot easier:
library(dplyr)
library(ggplot2)
library(readxl)
library(tidyr)
sugar <- read_excel("data/MakeoverMondayData.xlsx")
children_2014_2016 <- sugar %>%
gather("period", "intake", -1) %>%
separate(1, c("category", "age"), sep = " ", extra = "merge") %>%
filter(
category == "Children",
period == "(2014/15-2015/16)"
) %>%
mutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))
label_ <- data.frame(x = 2, y = 5, label = "5% of total energy")
children_2014_2016 %>%
ggplot() +
geom_bar(stat = "identity", fill = "lightblue", aes(x = age, y = intake)) +
geom_hline(yintercept = 5, linetype = "dashed", colour = "grey") +
geom_text(data = label_, aes(x = x, label = label, y = y)) +
ggtitle("Children's free sugars intake (as % of of total energy)") +
labs(x = "Age", y = "Free sugars as % of of total energy") +
theme_minimal()

Now I'll try to explain how does it work:
- The first step would be to make data tidy. For that, I'm going to
tidyr::gather
the columns to have two new columns, period
and intake
. The -1
means that I'm gathering all but the first column.
gather("period", "intake", -1)
- Separate the first column so I can have better control over the filtering in the next step. I'm separating the first column into two new columns,
category
(Children, Adult, etc.) and age
. The extra = "merge"
argument is there because there would be more than two columns when separating with a whitespace, so I want to merge the extra stuff in the last column.
separate(1, c("category", "age"), sep = " ", extra = "merge")
- Filter by category and period. This is fairly straight forward
filter(
category == "Children",
period == "(2014/15-2015/16)"
) %>%
- Mutate the
age
column to be an ordered factor, so I can control the order in which the categories appear in the plot
mutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))
After this, everything but the label "5% of total energy" is pretty standard ggplot2
, I think.