0

I have to label state with unusually high murder rate using geom_text() Find the murder rate per millions. get the one with the highest.

ggplot(data = murders, aes(x = population/10^6, y = total)) + geom_point(aes(color = region)) + 
  xlab("population in millions") + ylab("murders") + ggtitle("US murders by state") + 
  geom_smooth(method = lm, se = FALSE) + geom_text(aes(label = abb)) 

This labels all points in the scatter plot. How would I label just one for the state with an unusually high murder rate?

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Maybe this helps: [Color one point and add an annotation in ggplot2?](https://stackoverflow.com/questions/14351608/color-one-point-and-add-an-annotation-in-ggplot2)? – Quinten Sep 25 '22 at 17:53

1 Answers1

0

You can simply calculate the murder rate and create a label column that just contains the abb for the state/district with the highest rate.

library(tidyverse)
library(dslabs) # for murders dataset

murders %>% 
  mutate(murder_per_capita = total/population) %>% 
  mutate(label = if_else(murder_per_capita == max(murder_per_capita), abb, NA_character_)) %>% 
  ggplot(aes(x = population/10^6, y = total)) + 
  geom_point(aes(color = region)) + 
  xlab("population in millions") + 
  ylab("murders") + 
  ggtitle("US murders by state") + 
  geom_smooth(method = lm, se = FALSE) +
  geom_text(aes(label = label))

Created on 2022-09-25 by the reprex package (v2.0.1)

However, given the roughly log-normal distribution of these variables, I think it would make more sense to show this plot with log-transformed axes. It both helps you satisfy the basic assumptions of linear regression and also makes it more clear how much of an outlier DC is:

qqnorm(log10(murders$population), main = "Log-normality of population")

qqnorm(log10(murders$total), main = "Log-normality of murders")


murders %>% 
  mutate(murder_per_capita = total/population) %>% 
  mutate(label = if_else(murder_per_capita == max(murder_per_capita), abb, NA_character_)) %>% 
  ggplot(aes(x = population/10^6, y = total)) + 
  scale_x_log10("population in millions") +
  scale_y_log10("murders") +
  geom_point(aes(color = region)) + 
  ggtitle("US murders by state") + 
  geom_smooth(method = lm, se = FALSE) +
  geom_text(aes(label = label))

Created on 2022-09-25 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28