1

I'm currently trying to make my own graphical timeline like the one at the bottom of this page. I scraped the table from that link using the rvest package and cleaned it up.

Here is my code:

library(tidyverse)
library(rvest)
library(ggthemes)
library(lubridate)

URL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"

justices <- URL %>% 
  read_html %>%  
  html_node("table.wikitable") %>% 
  html_table(fill = TRUE) %>% 
  data.frame()

# Removes weird row at bottom of the table
n <- nrow(justices)
justices <- justices[1:(n - 1), ]

# Separating the information I want
justices <- justices %>% 
  separate(Justice.2, into = c("name","year"), sep = "\\(") %>% 
  separate(Tenure, into = c("start", "end"), sep = "\n–") %>% 
  separate(end, into = c("end", "reason"), sep = "\\(") %>% 
  select(name, start, end) 

# Removes wikipedia tags in start column
justices$start <- gsub('\\[e\\]$|\\[m\\]|\\[j\\]$$','', justices$start)

justices$start <- mdy(justices$start)

# This will replace incumbencies with NA
justices$end <- mdy(justices$end)

# Incumbent judges are still around! 
justices[is.na(justices)] <- today()

justices$start = as.Date(justices$start, format = "%m/%d%/Y")
justices$end = as.Date(justices$end, format = "%m/%d%/Y")

justices %>% 
  ggplot(aes(reorder(x = name, X = start))) +
  geom_segment(aes(xend = name,
                   yend = start,
                   y = end)) +
  coord_flip() + 
  scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
  theme(axis.title = element_blank()) +
  theme_fivethirtyeight() +
  NULL

This is the output from ggplot (I'm not worried about aesthetics yet I know it looks terrible!): This is the output from ggplot (I'm not worried about aesthetics yet I know it looks terrible!):

The goal for this plot is to order the judges chronologically from their start date, so the judge with the oldest start date should be at the bottom while the judge with the most recent should be at the top. As you can see, There are multiple instances where this rule is broken.

Instead of sorting chronologically, it simply lists the judges as the order they appear in the data frame, which is also the order Wikipedia has it in. Therefore, a line segment above another segment should always start further right than the one below it

My understanding of reorder is that it will take the X = start from geom_segment and sort that and list the names in that order.

The only help I could find to this problem is to factor the dates and then order them that way, however I get the error

Error: Invalid input: date_trans works with objects of class Date only.

Thank you for your help!

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
stoa
  • 93
  • 8
  • I think it is ordering them by `year`, not the full date, then by `name`. Seems the trick is getting reorder to use the full date. – Anonymous coward Jun 29 '18 at 22:06
  • I thought this too, but if you look at the bottom portion of the plot there is that group of three small lines that are entirely out of order, even if it were year. However now that I'm looking at it I wonder if my geom_segment code is also wrong... – stoa Jun 29 '18 at 22:10
  • Oh, you're right. It's putting Thomas Johnson before John Rutledge. – Anonymous coward Jun 29 '18 at 22:12
  • Okay, so it looks like that the lines that belong to the judges that had more than one appointment are being placed in the center of where they are actually supposed to be too. – stoa Jun 29 '18 at 22:23
  • I would love to see it! – stoa Jun 29 '18 at 22:28
  • Sorry in the comment I left out some code. I will have to post this as an answer even though it breaks several aspects that are currently working. – Hack-R Jun 29 '18 at 22:34

2 Answers2

1

I would make this a comment, but I couldn't fit it.

This was an attempt I gave up on. It looks like it actually does fix the problem, but it broke several other aspects of the formatting and I've run out of time to fix it back.

justices <- justices[order(justices$start, decreasing = TRUE),]
any(diff(justices$start) > 0) # FALSE, i.e. it works

justices$id <- nrow(justices):1


ggplot(data=justices, mapping=aes(x = start, y=id)) + #,color=name, color = 
  scale_x_date(date_breaks = "20 years", date_labels = "%Y") +
  scale_y_discrete(breaks=justices$id, labels = justices$name) +
  geom_segment(aes(xend = end, y = justices$id, yend = justices$id), size = 5) +
  theme(axis.title = element_blank()) +
  theme_fivethirtyeight() 

Please also refer to this thread. GL!

Hack-R
  • 22,422
  • 14
  • 75
  • 131
1

You can make the name column a factor and use forcats::fct_reorder to reorder names based on start date. fct_reorder can take a function that's used for ordering start; you can use min() to order by the earliest start date for each justice. That way, judges with multiple start dates will be sorted according to the earliest one. Only a two line change: add a mutate at the beginning of the pipe, and remove the reorder inside aes.

justices %>% 
  mutate(name = as.factor(name) %>% fct_reorder(start, min)) %>%
  ggplot(aes(x = name)) +
  geom_segment(aes(xend = name,
                   yend = start,
                   y = end)) +
  coord_flip() + 
  scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
  theme(axis.title = element_blank()) +
  theme_fivethirtyeight()

Created on 2018-06-29 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60