0

I am mainly posting because I really think I am over complicating this. I am creating a plot of 12 different lines over time. I would like each day to be represented on the x-axis with the "title" beneath each.

I've tried a few solutions and what I have "works" but it's not that good. Ignoring the placeholders I have in there I would like there to be points where they increase as well as showing where people are a little more clearly. My code seems a little long winded; maybe there is a better way to do this.

riddle_log <- structure(list(date = structure(c(1559779200, 1559865600, 1560124800, 
1560211200, 1560297600, 1560384000, 1560470400, 1560470400, 1560470400, 
1560729600, 1560729600, 1560816000, 1560902400, 1560988800, 1561075200, 
1561334400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    title = c("The Midget", "Bowling Balls", "Poisonous Ice", 
    "Dog Crosses River", "Camel Race", "Two Masked Men", "The Cabin", 
    "Black Truck", "Burglary", "Japanese Ship", "Haunted Floor", 
    "East and West", "Filling the Room", "Untied", "Window Jumper", 
    "Window Faller"), Brigid = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), Carly = c(0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 
    3, 3, 3, 3, 3, 3), Christian = c(1, 1, 1, 1, 1, 1, 1, 1, 
    2, 2, 3, 3, 3, 3, 4, 4), Daniel = c(0, 0, 0, 0, 0, 1, 1, 
    2, 2, 2, 2, 3, 3, 3, 3, 3.5), Jess = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Luke = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Mara = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Marcus = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 1, 2, 2, 3, 3, 3, 3.5), Nassim = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Nathalie = c(0, 0, 1, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Neil = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-16L), class = c("tbl_df", "tbl", "data.frame"))

library(tidyverse)
library(ggthemes)

line1 <- riddle_log %>% 
  select(date, Brigid)

line2 <- riddle_log %>% 
  select(date, Carly)

line3 <- riddle_log %>% 
  select(date, Christian)

line4 <- riddle_log %>% 
  select(date, Daniel)

line5 <- riddle_log %>% 
  select(date, Jess)

line6 <- riddle_log %>% 
  select(date, Luke)

line7 <- riddle_log %>% 
  select(date, Mara)

line8 <- riddle_log %>% 
  select(date, Marcus)

line9 <- riddle_log %>% 
  select(date, Nassim)

line10 <- riddle_log %>% 
  select(date, Nathalie)

line11 <- riddle_log %>% 
  select(date, Neil)

ggplot() + 
  geom_line(data = line1, aes(x = date, y = Brigid, color = "a")) +
  geom_line(data = line2, aes(x = date, y = Carly, color = "b")) +
  geom_line(data = line3, aes(x = date, y = Christian, color = "c")) +
  geom_line(data = line4, aes(x = date, y = Daniel, color = "d")) +
  geom_line(data = line5, aes(x = date, y = Jess, color = "e")) +
  geom_line(data = line6, aes(x = date, y = Luke, color = "f")) +
  geom_line(data = line7, aes(x = date, y = Mara, color = "g")) +
  geom_line(data = line8, aes(x = date, y = Marcus, color = "h")) +
  geom_line(data = line9, aes(x = date, y = Nassim, color = "i")) +
  geom_line(data = line10, aes(x = date, y = Nathalie, color = "j")) +
  geom_line(data = line11, aes(x = date, y = Neil, color = "k")) +
  scale_color_manual(name = "Analysts", 
                     values = c("a" = "blue", "b" = "red", "c" = "orange", "d" = "black",
                                "e" = "steelblue", "f" = "blue", "g" = "blue", "h" = "blue",
                                "i" = "blue", "j" = "blue", "k" = "blue")) +
  xlab('Date') +
  ylab('Wins') +
  ggtitle(" NAME ") 

#+
 # scale_x_date(breaks = as.Date(c("2019-05-01", "2019-08-15")))



 # scale_x_discrete(name, breaks, labels, limits)

In short what I would like to add four things: -All dates represented on the x-axis. The weekends are skipped but I would not want them to have gaps in the plot rather treated as consecutive days. -If it's possible to have the title incorperated somehow that would be cool except I am struggling to think how since some days have multiple titles. -A more distinguished way to see all line progress as opposed to the bad overlap that's happening here -Points.

If there are any themes that are better suited for this type of problem I'm open for anything.

Dag Hjermann
  • 1,960
  • 14
  • 18
Johnny Thomas
  • 623
  • 5
  • 13
  • 2
    Multiple calls to geom_line is really how not to use ggplot. Much better to use `gather` to make the data into long format and then just one call to `geom_line` – Richard Telford Jun 25 '19 at 14:04

2 Answers2

2

First of all, you are right that your code is "a little long winded". To take advantage of ggplot you should have your data in tidy ("tall") format, with one variable for "person" and another variable for the persons' score. That is easy to accomplish using gather() in the tidyr package:

riddle_log2 <- riddle_log %>%
  tidyr::gather("Analyst", "Wins", Brigid:Neil)

Now that the data are in the preferred format for ggplot, we can plot them much more easily, like this:

ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(size = 2)

ggplot with default colors and equal line widths However, a lot of the lines are on top of each other. We can try to make the plot better by plotting the first persons (which are plotted first and will end up behind the other lines) with thicker lines, for instance like this:

ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(aes(size = Analyst)) +
  scale_size_manual(values = seq(4, 1, length = 11))

ggplot with default colors and different line widths Now, this is slightly better. Next, we can improve the colors. There are a huge amount of color palettes for R available. In cases such as this I often use the palettes of Paul Tol:

tol_colors = c("#332288", "#6699CC", "#88CCEE", "#44AA99", "#117733", "#999933",   
               "#DDCC77", "#661100", "#CC6677", "#882255", "#AA4499")
ggplot(riddle_log2) + 
  geom_line(aes(x = date, y = Wins, color = Analyst, size = Analyst)) +
  scale_size_manual(values = seq(5, 1, length = 11)) +
  scale_color_manual(values = tol_colors)

ggplot with custom colors line widths Now, this isn't perfect, but it is an improvement. What you should consider is to split the plots in a bunch of subplots using facet_wrap():

gg <- ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(size = 2) +
  scale_color_manual(values = tol_colors) + 
  facet_wrap(~Analyst) 
gg

ggplot split up into one subplot per person This is a much better option in this case, I think.

Next, you also want the x axis to show all dates. It is bit too little space to show every single day, so I will here show labels for every second day:

gg + 
  scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") +
  theme(axis.text.x = element_text(hjust = 0, angle = -45))

ggplot with iproved date axis

As you can see, formatting labels isn't exactly straightforward, but it is very flexible. Especially the codes for how to show the time/date are quite criptic; in this case, %d indicates "date" and %mindicates "abbreviated month". Other codes can be found by running ?strptime.

Finally, wer'e going to add the day's "title" for every time the "Win" score is increasing. We start by adding a variable 'Wins_increase' for the increase in Wins:

riddle_log2 <- riddle_log2 %>%
  arrange(Analyst, date) %>%                # Make sure sortings is correct
  group_by(Analyst) %>%                     # 'Wins_increase' will be calculated for every Analyst 
  mutate(Wins_increase = Wins - lag(Wins))  # How much 'Wins' have increased since last day

Then we use geom_text() to add rotated labels:

gg + scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") +  # as before
  theme(axis.text.x = element_text(hjust = 0, angle = -45)) +      # as before
  geom_text(data = riddle_log2 %>% filter(Wins_increase > 0),      # Pick only where "Wins" is increasing
            aes(y = Wins + 0.3, label = title),                    # We add 0.3 to lift the labels a bit
            hjust = 0, angle = 90, size = 2)                       # Left-adjust and rotate labels

ggplot with labels added

The next thing to fix is the overlap between labels for Marcus (because he won twice in the same day). This can be fixed using ggrepel package.

Dag Hjermann
  • 1,960
  • 14
  • 18
  • wow awesome stuff; Now do you know how to have each date represented on here? Probably on a diagonal would be sufficient.It would also be cool to have the "title" somewhere around the point where the analyst gets the respective increase. – Johnny Thomas Jun 25 '19 at 15:04
  • 1
    Added formatting for showing every second date (changing it to every date is straightforward). If you want to add the name of the analyst in different places in each facet (in the tho last examples), you need to create a data frame with 11 lines and four variables (x, y, Analyst, and Label). See [this answer](https://stackoverflow.com/a/47836541/1734247). x and y can of course be calculated using derivatives or something if you wish. – Dag Hjermann Jun 26 '19 at 12:25
  • I see; thanks for the dates. What I meant for the latter part was there is a variable called "title". It would be cool to have the title name at every increase in value for each analyst. – Johnny Thomas Jun 26 '19 at 14:02
  • For example; when Christian gets a one point increase in the begining, the title value is "The Midget" I would like this to be on the plot; same for each of that variable – Johnny Thomas Jun 26 '19 at 14:07
  • 1
    @JohnnyThomas Added to solution. You can play around with ggrepel to improve the labelling. – Dag Hjermann Jun 27 '19 at 07:21
  • great stuff man, hope to get on your level one day – Johnny Thomas Jun 27 '19 at 14:14
1

Here's an example of converting to "long" data to make ggplot easier. I also added a geom_jitter layer to make it easier to see days with overlaps.

riddle_log %>%
  tidyr::gather(Analyst, Wins, -c(date, title)) %>%
  ggplot(aes(x = date, y = Wins, color = Analyst)) +
  geom_line() +
  geom_jitter( width = 0, shape = 21, alpha = 0.7) + # one way to show daily overlap
  scale_color_manual(name = "Analysts", 
                     values = c("Brigid" = "blue", "Carly" = "red", 
                                "Christian" = "orange", "Daniel" = "black",
                                "Jess" = "steelblue", "Luke" = "blue", 
                                "Mara" = "blue", "Marcus" = "blue",
                                "Nassim" = "blue", "Nathalie" = "blue", 
                                "Neil" = "blue"))

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53