1

I have data with the following columns:

  • state: abbreviated state names (AK, NY, NJ...etc.)
  • subs: Number of magazine subscribers per state
  • income: Average state income for that state
  • party: Political party - (1=democrat, 0=republican, 0.5=unknown)

Sample data is available here https://raw.githubusercontent.com/vindication09/sample_data/master/sample_data.csv

I plotted political affiliation using the following code and packages. full_join3 is the name of my dataset.

library(ggplot2)
library(usmap)

usmap::plot_usmap(data = full_join3, values = "party", lines = "red") + 
    scale_fill_discrete(
        name = "Political Affiliation Based on 2016 election", 
        label = scales::comma
    ) + 
    theme(legend.position = "right")

I want to add the number of subscribers and the average income if possible for each state as a label. How would I go about doing that?

Calum You
  • 14,687
  • 4
  • 23
  • 42
Vindication09
  • 45
  • 2
  • 8
  • 1
    Hi! Please review [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including supplying your data with `dput` so that we don't have to reconstruct your data by hand. This makes your question more likely to get answered and a better resource for other users! – Calum You Jul 02 '18 at 19:03
  • Hi, Thank you for the feedback. I made some sample data and made it available in github. – Vindication09 Jul 02 '18 at 19:20

1 Answers1

2

Alright, here's my best stab at achieving this, though I don't think there is a good way to get all these numbers onto the map while keeping it somewhat readable. I would suggest using a different visualisation to convey all of this information at once.

The approach is basically to ignore usmap since I cannot add parameters to plot_usmap without editing the source code. Instead, we make a geometry with inset Alaska and Hawaii using the data from the fiftystater package, and join it onto the data provided using a reference table of state names and abbreviations.

Then, plotting is a matter of using geom_sf (currently in the development version of ggplot2) and geom_label_repel from the ggrepel package. We pass a preconstructed dataframe that has all of the labels for the states.

Again though, I would prefer an alternative visualisation that skips the map and instead just more clearly shows the relationships between the variables. This makes it much more obvious that high sub states are more Republican across income levels. Although, I would check the original data (Hawaii is republican? Oklahoma is democrat?)

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
library(fiftystater)
library(ggrepel)
tbl <- structure(list(state = c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY", "DC"), subs = c(43L, 373L, 604L, 431L, 157L, 524L, 682L, 178L, 594L, 395L, 76L, 492L, 597L, 686L, 282L, 27L, 560L, 528L, 309L, 306L, 101L, 139L, 414L, 280L, 22L, 548L, 82L, 675L, 684L, 598L, 66L, 653L, 541L, 636L, 530L, 332L, 21L, 469L, 341L, 456L, 278L, 153L, 499L, 700L, 223L, 222L, 305L, 20L, 321L, 232L, 107L), income = c(81360L, 36595L, 51963L, 47673L, 56776L, 61959L, 37456L, 64224L, 56211L, 25183L, 44677L, 78116L, 35134L, 85910L, 81341L, 52409L, 75060L, 55098L, 56239L, 84138L, 37589L, 50006L, 88730L, 71527L, 34506L, 76364L, 89672L, 79442L, 42263L, 73869L, 65454L, 80625L, 60519L, 35125L, 60869L, 64727L, 86541L, 75562L, 50824L, 44414L, 26103L, 32962L, 61337L, 48314L, 25417L, 35721L, 34247L, 86608L, 64030L, 61089L, 37934L), party = c(0, 0, 1, 0.5, 1, 0, 0, 0.5, 0, 1, 0, 0.5, 0.5, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0.5, 0, 0.5, 0.5, 1, 0, 0.5, 0, 1, 0, 1, 0, 0.5, 1, 1, 0, 0.5, 0.5, 1, 0.5, 0, 0.5, 0.5, 0, 0, 0, 1, 1, 0.5)), row.names = c(NA, -51L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(state = structure(list(), class = c("collector_character", "collector")), subs = structure(list(), class = c("collector_integer", "collector")), income = structure(list(), class = c("collector_integer", "collector")), party = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

names_abbs <- tibble(state.abb, state.name) %>%
  mutate(state.name = str_to_lower(state.name)) %>%
  add_row(state.abb = "DC", state.name = "district of columbia")

sf_fifty <- st_as_sf(fifty_states, coords = c("long", "lat")) %>% 
  group_by(id, piece) %>% 
  summarize(do_union = FALSE) %>%
  st_cast("POLYGON") %>% 
  ungroup() %>%
  left_join(names_abbs, by = c("id" = "state.name")) %>%
  left_join(tbl, by = c("state.abb" = "state")) %>%
  mutate(
    party = factor(party, labels = c("Republican", "Unknown", "Democrat")),
    lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
    lat = map_dbl(geometry, ~st_centroid(.x)[[2]])
  )

labels <- sf_fifty %>%
  mutate(area = st_area(geometry)) %>%
  group_by(state.abb) %>%
  top_n(1, area) %>%
  mutate(text = str_c(str_to_title(id), "\n", subs, " subs, $", income))

ggplot(sf_fifty) +
  theme_minimal() +
  geom_sf(aes(fill = party)) +
  coord_sf(datum = NA) +
  geom_label_repel(
    data = labels,
    mapping = aes(label = text, x = lon, y = lat),
    size = 2
    ) +
  scale_fill_brewer(
    type = "diverging",
    palette = "RdYlBu",
    name = "Political Affiliation Based on 2016 Election"
    ) +
  theme(
    legend.position = "bottom",
    axis.title = element_blank()
    )

ggplot(labels, aes(x = income, y = subs)) +
  theme_minimal() +
  geom_point(aes(colour = party), size = 3) +
  scale_colour_discrete(name = "Political Affiliation Based on 2016 Election") +
  geom_text_repel(aes(label = state.abb)) +
  theme(legend.position = "bottom")

Created on 2018-07-02 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • Hi Calum, This is incredible. Thank you so much for your answer. I am trying to recreate this process but I am getting a blank map at the end. All the labels are there but the actual map of the united states is missing. Do you have any thoughts on why this could be happening? – Vindication09 Jul 02 '18 at 22:15
  • Thank you for getting back to me. I am indeed using ggplot2 version 2.2.1. Should I be running the development version of ggplot2? The data I provided is randomly generated. It is not the real data, hence why some political affiliations will not make sense. The real data is of the same structure. There are no error messages. The labels for each state appear but the actual US map does not. – Vindication09 Jul 03 '18 at 12:18
  • I fixed it! It removed ggplot2 and re installed it and it worked. – Vindication09 Jul 03 '18 at 13:31
  • Yep, `geom_sf` hasn't moved to the release version yet. – Calum You Jul 03 '18 at 16:57