0

I need to make a line plot in ggplot2 using a data frame called df that looks something like this:

   DATE       ITEM        NUMBER_SOLD
    <date>     <chr>       <int>
 1 2018-01-08 APPLE         3
 2 2018-01-09 APPLE         3
 3 2018-01-09 PEAR          2
 4 2018-01-09 ORANGE        1
 5 2018-01-10 APPLE         2
 6 2018-01-10 PEAR          1
 7 2018-01-12 CHERRY        2
 8 2018-01-12 MANGO         1
 9 2018-01-15 PINEAPPLE     1
10 2018-01-15 APRICOT       1

etc

The data frame is basically a tibble showing how many times a particular item was sold on a given day in 2018 with a total of 336 rows.

The plot needs to be a line plot showing the sale of one particular item (apple) with the date on the x axis, number sold on the y axis and an additional line on the y axis showing a 15% increase in sales like this:

df %>% filter(ITEM == "APPLE") %>%
  ggplot(aes(DATE, NUMBER_SOLD)) +
  geom_line(size = 1, col = "red") +
  theme(axis.text.x = element_text(angle = 90)) +
  geom_line(aes(y = NUMBER_SOLD + NUMBER_SOLD/100*15), col = "green4", size = 1, alpha = 0.6) +
  scale_x_date(date_labels="%b", date_breaks  = "1 month")

However, I would also need to add a legend to show what both lines represent, e.g. red colored line representing the original number of sales and the green one representing the original number of sales + 15%. How might I achieve that?

  • I personally always recommend pictures to show the current and expected plot(s). – NelsonGon Mar 21 '19 at 04:54
  • 3
    See questions like [this](https://stackoverflow.com/questions/17148679/construct-a-manual-legend-for-a-complicated-plot), which use the `aes(colour = "Label 1")` trick. The alternative is to gather the data into a long format and plot them as a single layer, which will allow ggplot's automatic legending to work. – Marius Mar 21 '19 at 05:00

1 Answers1

2

The trick is to do the calculation in the data frame first, then use gather() to turn the data to long and have the numbers into one column with another variable indicating whether each number is for actual or expected sale.

library(tidyverse)

df <- tribble(~"DATE",       ~"ITEM",        ~"NUMBER_SOLD",
"2018-01-08", "APPLE",         3,
"2018-01-09", "APPLE",         3,
"2018-01-09", "PEAR",          2,
"2018-01-09", "ORANGE",        1,
"2018-01-10", "APPLE",         2,
"2018-01-10", "PEAR",          1,
"2018-01-12", "CHERRY",        2,
"2018-01-12", "MANGO",         1,
"2018-01-15", "PINEAPPLE",     1,
"2018-01-15", "APRICOT",       1) %>% 
  mutate(DATE = parse_date(DATE),
         NUMBER_SOLD_EXP = NUMBER_SOLD + NUMBER_SOLD/100*15) %>% 
  gather(key = category, value = SOLD, NUMBER_SOLD, NUMBER_SOLD_EXP)

df
# A tibble: 20 x 4
   DATE       ITEM      category         SOLD
   <date>     <chr>     <chr>           <dbl>
 1 2018-01-08 APPLE     NUMBER_SOLD      3   
 2 2018-01-09 APPLE     NUMBER_SOLD      3   
 3 2018-01-09 PEAR      NUMBER_SOLD      2   
 4 2018-01-09 ORANGE    NUMBER_SOLD      1   
 5 2018-01-10 APPLE     NUMBER_SOLD      2   
 6 2018-01-10 PEAR      NUMBER_SOLD      1   
 7 2018-01-12 CHERRY    NUMBER_SOLD      2   
 8 2018-01-12 MANGO     NUMBER_SOLD      1   
 9 2018-01-15 PINEAPPLE NUMBER_SOLD      1   
10 2018-01-15 APRICOT   NUMBER_SOLD      1   
11 2018-01-08 APPLE     NUMBER_SOLD_EXP  3.45
12 2018-01-09 APPLE     NUMBER_SOLD_EXP  3.45
13 2018-01-09 PEAR      NUMBER_SOLD_EXP  2.3 
14 2018-01-09 ORANGE    NUMBER_SOLD_EXP  1.15
15 2018-01-10 APPLE     NUMBER_SOLD_EXP  2.3 
16 2018-01-10 PEAR      NUMBER_SOLD_EXP  1.15
17 2018-01-12 CHERRY    NUMBER_SOLD_EXP  2.3 
18 2018-01-12 MANGO     NUMBER_SOLD_EXP  1.15
19 2018-01-15 PINEAPPLE NUMBER_SOLD_EXP  1.15
20 2018-01-15 APRICOT   NUMBER_SOLD_EXP  1.15

Now you just need to call geom_line once, using the colour argument on the variable indicating whether the number is actual or expected sold. You'll need to add scale_colour_manual() to specify what colour you want to attach to the categories.

df %>% filter(ITEM == "APPLE") %>%
  ggplot(aes(DATE, SOLD)) +
  geom_line(aes(colour = category), size = 1) +
  scale_colour_manual(values = c("NUMBER_SOLD" = "red", "NUMBER_SOLD_EXP" = "green")) +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_x_date(date_labels="%b", date_breaks  = "1 month")

enter image description here

Phil
  • 7,287
  • 3
  • 36
  • 66