To aggregate across phrases as in the original question, I did
anti <-
hate_crime %>%
filter(DATA_YEAR %in% c("2009", "2017")) %>%
mutate(
ANTI_WHITE = grepl("Anti-White", BIAS_DESC),
ANTI_BLACK = grepl("Anti-Black", BIAS_DESC),
ANTI_HISPANIC = grepl("Anti-Hispanic", BIAS_DESC)
) %>%
select(DATA_YEAR, starts_with("ANTI"))
I then created the counts of each occurrence with group_by()
and summarize_all()
(noting that the sum()
of a logical vector is the number of TRUE
occurrences), and used pivot_longer()
to create a 'tidy' summary
anti %>%
group_by(DATA_YEAR) %>%
summarize_all(~ sum(.)) %>%
tidyr::pivot_longer(starts_with("ANTI"), "BIAS", values_to = "COUNT")
The result is something like (there were errors importing the data with read_csv()
that I did not investigate)
# A tibble: 6 x 3
DATA_YEAR BIAS COUNT
<dbl> <chr> <int>
1 2009 ANTI_WHITE 539
2 2009 ANTI_BLACK 2300
3 2009 ANTI_HISPANIC 486
4 2017 ANTI_WHITE 722
5 2017 ANTI_BLACK 2101
6 2017 ANTI_HISPANIC 444
Visualization seems like a second, separate, question.
The code can be made a little simpler by defining a function
n_with_bias <- function(x, bias)
sum(grepl(bias, x))
and then avoiding the need to separately mutate the data
hate_crime %>%
filter(DATA_YEAR %in% c("2009", "2017")) %>%
group_by(DATA_YEAR) %>%
summarize(
ANTI_WHITE = n_with_bias(BIAS_DESC, "Anti-White"),
ANTI_BLACK = n_with_bias(BIAS_DESC, "Anti-Black"),
ANTI_HISPANIC = n_with_bias(BIAS_DESC, "Anti-Hispanic")
) %>%
tidyr::pivot_longer(starts_with("ANTI"), names_to = "BIAS", values_to = "N")
On the other hand, a base R approach might create vectors for years-of-interest and all biases (using strsplit()
to isolate the components of the compound biases)
years <- c("2009", "2017")
biases <- unique(unlist(strsplit(hate_crime$BIAS_DESC, ";")))
then create vectors of biases in each year of interest
bias_by_year <- split(hate_crime$BIAS_DESC, hate_crime$DATA_YEAR)[years]
and iterate over each year and bias (nested iterations can be inefficient when there are a large, e.g., 10,000's, number of elements, but that's not a concern here)
sapply(bias_by_year, function(bias) sapply(biases, n_with_bias, x = bias))
The result is a classic data.frame with all biases in each year
2009 2017
Anti-Black or African American 2300 2101
Anti-White 539 722
Anti-Jewish 932 983
Anti-Arab 0 106
Anti-Protestant 38 42
Anti-Other Religion 111 85
Anti-Islamic (Muslim) 0 0
Anti-Gay (Male) 0 0
Anti-Asian 128 133
Anti-Catholic 52 72
Anti-Heterosexual 21 33
Anti-Hispanic or Latino 486 444
Anti-Other Race/Ethnicity/Ancestry 296 280
Anti-Multiple Religions, Group 48 52
Anti-Multiple Races, Group 180 202
Anti-Lesbian (Female) 0 0
Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group) 0 0
Anti-American Indian or Alaska Native 68 244
Anti-Atheism/Agnosticism 10 6
Anti-Bisexual 24 24
Anti-Physical Disability 24 66
Anti-Mental Disability 70 89
Anti-Gender Non-Conforming 0 13
Anti-Female 0 48
Anti-Transgender 0 117
Anti-Native Hawaiian or Other Pacific Islander 0 15
Anti-Male 0 25
Anti-Jehovah's Witness 0 7
Anti-Mormon 0 12
Anti-Buddhist 0 15
Anti-Sikh 0 18
Anti-Other Christian 0 24
Anti-Hindu 0 10
Anti-Eastern Orthodox (Russian, Greek, Other) 0 0
Unknown (offender's motivation not known) 0 0
This avoids the need to enter each bias in the summarize()
step. I'm not sure how to do that computation in a readable tidy-style analysis.
Note that in the table above any bias with a (
has zeros in both years. This is because grepl()
treats (
in the bias as a grouping symbol; fix this by adding fixed = TRUE
n_with_bias <- function(x, bias)
sum(grepl(bias, x, fixed = TRUE))
and an updated result
2009 2017
Anti-Black or African American 2300 2101
Anti-White 539 722
Anti-Jewish 932 983
Anti-Arab 0 106
Anti-Protestant 38 42
Anti-Other Religion 111 85
Anti-Islamic (Muslim) 107 284
Anti-Gay (Male) 688 692
Anti-Asian 128 133
Anti-Catholic 52 72
Anti-Heterosexual 21 33
Anti-Hispanic or Latino 486 444
Anti-Other Race/Ethnicity/Ancestry 296 280
Anti-Multiple Religions, Group 48 52
Anti-Multiple Races, Group 180 202
Anti-Lesbian (Female) 186 133
Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group) 311 287
Anti-American Indian or Alaska Native 68 244
Anti-Atheism/Agnosticism 10 6
Anti-Bisexual 24 24
Anti-Physical Disability 24 66
Anti-Mental Disability 70 89
Anti-Gender Non-Conforming 0 13
Anti-Female 0 48
Anti-Transgender 0 117
Anti-Native Hawaiian or Other Pacific Islander 0 15
Anti-Male 0 25
Anti-Jehovah's Witness 0 7
Anti-Mormon 0 12
Anti-Buddhist 0 15
Anti-Sikh 0 18
Anti-Other Christian 0 24
Anti-Hindu 0 10
Anti-Eastern Orthodox (Russian, Greek, Other) 0 22
Unknown (offender's motivation not known) 0 0