How do I use the stat()-command with categorical data in ggplot2?

Question

This question is about converting the Y-values in your plot to percentages, much like this question. However, the answers no longer seem applicable since we can no longer surround our variables with "..", but have to use the stat()-function.

My plot looks like this:

It's created with the following code. The X-variable is a categorical variable (city) and the y variable counts the number of observation for each city:

ggplot(fulldata, aes(x=fct_rev(fct_infreq(CITY_LADOK)))) +geom_bar() +coord_flip()

I want to transform the y valuables to percentages, preferably without having to create a reference table. The help page for calculated aesthetics is....not particularly helpful. It doesn't state if percentages can be calculated nor how it's done. If I extrapolate from the examples though, I should be able to write something along the lines of:

    ggplot(fulldata, aes(x=fct_rev(fct_infreq(CITY_LADOK)))) 
+geom_bar(y=stat(count/sum(count)))+coord_flip()

...In theory at least, now I get an error message claiming:

Error in sum(count) : invalid 'type' (closure) of argument

But what if I scale this down and simply use stat() to calculate the original plot?

ggplot(fulldata, aes(x=fct_rev(fct_infreq(CITY_LADOK)))) 
    +geom_bar(y=stat(count))+coord_flip()

We get another error message

Error in rep(value[[k]], length.out = n) : 
  attempt to replicate an object of type 'closure'

It doesn't work with y=stat(bin) and it doesn't seem to work with y=stat(identity) either. Can the stat()-function be used at all with categorical values and if so, can it be used to calculate percentages?

Excerpt of data:

structure(list(start_date = structure(c(17776, 17776, 17776, 
17776, 17776, 17776, 17776, 17776, 17776, 17776, 17776, 17776, 
17776, 17776, 17776, 17776, 17776, 17776, 17776, 17776), class = "Date"), 
    CITY_LADOK = c("GÖTEBORG", "LILLA_EDET", "GÖTEBORG", "GÖTEBORG", 
    "UDDEVALLA", "SKÖVDE", "VÄSTERÅS", "TROLLHÄTTAN", "ALE", 
    "GÖTEBORG", "GÖTEBORG", "GÖTEBORG", "UPPSALA", "TJÖRN", "TROLLHÄTTAN", 
    "UDDEVALLA", "UDDEVALLA", "KUNGSBACKA", "VÄNERSBORG", "UDDEVALLA"
    )), row.names = c(NA, -20L), groups = structure(list(start_date = structure(17776, class = "Date"), 
    .rows = list(1:20)), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

score 1 · Accepted Answer · answered Nov 14 '19 at 17:40

Magnus, you're quite close, but you'll want to ensure you're using the aes() function carefully when you map a variable to an aesthetic. Basically, any time you have a dynamic value that you're providing to a ggplot function, aes() is required. Small sample below.

library(tidyverse)
df <- tibble(
  city = c(rep("A", 5), rep("B", 2), "C", "D", "E")
)

# Simplified count will work, but make sure to use aes()
df %>%
  ggplot(aes(x = fct_rev(fct_infreq(city)))) +
  geom_bar(aes(y = stat(count))) +
  coord_flip()

# Percentage will work as well, but take care with aes() and parentheses
df %>%
  ggplot(aes(x = fct_rev(fct_infreq(city)))) +
  geom_bar(aes(y = stat(count) / sum(stat(count)))) +
  coord_flip()

# Can also request the proportion directly, but then need to ensure 
# proportion grouping isn't the x variable by default.
df %>%
  ggplot(aes(x = fct_rev(fct_infreq(city)))) +
  geom_bar(aes(y = stat(prop), group = NA)) +
  coord_flip()

You might also want to note that stat() accesses computed variables -- things like count or prop in the case of geom_bar(), geom_col(). The family of stat_identity(), stat_count(), stat_bin() are different, and describe different ways for ggplot to aggregate the data.

Thank you for your pedagogical (is that an expression?) answer. I'm a bit uncertain of what a "dynamic value" entails, but what I take home from this is that (1) in order to access variables inside geoms i first need to use the aes()-function and (2), I can't use calculated values directly, but need to use the stat()-function prior to each separate use. — Magnus, Nov 15 '19 at 09:56
Yep. If instead you were providing something simple static like `color = "blue"` as an argument to `geom_bar`, you wouldn't need to wrap it in `aes()`. — ravic_, Nov 15 '19 at 14:42

How do I use the stat()-command with categorical data in ggplot2?

1 Answers1