Bar plot by group in descending order of categorical variable count within each group

Question

Based on the data and code below, how can I sort the bars in descending order for each group?

Purpose: To show count of each FF in descending order for each city by district. This will then show me which city in which district has the highest count of a particular FF. Where FF = Flood factor (risk of flooding) ranging from 1 to 10 so, the plot will show the dominant FF in each city in a district.

Code + data:

df_sample = structure(list(City = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D", "D", 
"D", "D", "D", "D", "D", "D", "D", "D", "D", "E", "E", "E", "E", 
"E", "E", "E", "E", "E", "F", "F", "F", "F", "F", "F", "F", "F", 
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"
), District = c("D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", 
"D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", 
"D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", 
"D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", 
"D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", 
"D1", "D1", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", 
"D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", 
"D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", 
"D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", "D2", 
"D2", "D2", "D2"), FF = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-99L))

# Plot
df_sample %>% 
  arrange(desc(FF)) %>% 
  ggplot(aes(x = City,
             y = FF,
             fill = District)) +
  geom_bar(position = "dodge",
           stat = "identity")

Relevant: https://stackoverflow.com/q/12774210/3358272, https://stackoverflow.com/q/18401931/3358272, and https://stackoverflow.com/q/44350031/3358272 — r2evans, Dec 14 '22 at 16:46
By *"count of each `FF`"* do you mean *total* of each `FF` by groups of City and District? — Rui Barradas, Dec 14 '22 at 16:49
@RuiBarradas, since `FF` is `categorical` so total = `total count`. Like how many times each `FF` appears in each city by district and then arrange that in descending order. — Ed_Gravy, Dec 14 '22 at 16:51
@Quinten honestly I am also thinking what'd be the most suitable plot for such a purpose. So, I am not entirely sure whether a bar plot would be the right way to do it or not. — Ed_Gravy, Dec 14 '22 at 17:01

score 2 · Answer 1 · answered Dec 14 '22 at 16:42

2

You can reorder City as factor

library(dplyr)
library(ggplot2)
library(forcats)

df_sample %>% 
  mutate(City = fct_rev(fct_reorder2(City,FF,District))) %>%
  # arrange(desc(FF)) %>%
  ggplot(aes(x = City,
             y = FF,
             fill = District)) +
  geom_bar(position = "dodge",
           stat = "identity")

answered Dec 14 '22 at 16:42

Vinícius Félix

8,448
6
16
32

Sorry, I meant descending order by count of each `FF` – Ed_Gravy Dec 14 '22 at 16:45

Rui Barradas · Accepted Answer · 2022-12-14T18:26:37.200

1

I am not sure that this is what is asked for.

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
})

df_sample %>%
  group_by(District, City) %>%
  summarise(FF = sum(FF)) %>%
  mutate(City = reorder(City, FF, decreasing = TRUE)) %>%
  ggplot(aes(x = City, FF, fill = District)) +
  geom_col(position = "dodge")
#> `summarise()` has grouped output by 'District'. You can override using the
#> `.groups` argument.

^{Created on 2022-12-14 with reprex v2.0.2}

Edit

Following the OP's comment below, here is a plot of counts.

The counts can be computed as follows.

df_sample %>% count(District, City, FF)
#>   District City FF  n
#> 1       D1    A  1 29
#> 2       D1    B  2 20
#> 3       D1    C  3  5
#> 4       D2    D  1 15
#> 5       D2    E  2  9
#> 6       D2    F  3 21

^{Created on 2022-12-14 with reprex v2.0.2}

And here is the bar plot.

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
})

df_sample %>%
  group_by(District, City) %>%
  summarise(FF = n()) %>%
  mutate(City = reorder(City, FF, decreasing = TRUE)) %>%
  ggplot(aes(x = City, FF, fill = District)) +
  geom_col(position = "dodge")
#> `summarise()` has grouped output by 'District'. You can override using the
#> `.groups` argument.

^{Created on 2022-12-14 with reprex v2.0.2}

Edit 2

And here is the plot with the counts over the bars and the values of FF in the middle of the bars. This is probably too complicated but it works.

df_sample %>%
  mutate(FF = factor(FF)) %>%
  group_by(District, City, FF) %>%
  summarise(n = n()) %>% 
  group_by(District) %>%
  mutate(
    tmp = as.integer(factor(n)),
    City = reorder(City, tmp, decreasing = TRUE)
  ) %>%
  select(-tmp) %>%
  ggplot(aes(x = City, n, fill = District)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = n), vjust = -1) +
  geom_text(aes(y = n/2, label = FF)) +
  # geom_label(aes(y = n/2, label = FF), fill = "white") +
  ylim(0, 31)
#> `summarise()` has grouped output by 'District', 'City'. You can override using
#> the `.groups` argument.

^{Created on 2022-12-14 with reprex v2.0.2}

edited Dec 14 '22 at 18:26

answered Dec 14 '22 at 16:48

Rui Barradas

70,273
8
34
66

Instead of `sum` I am interested in `count` – Ed_Gravy Dec 14 '22 at 16:52
Oh, so now the count is on the `y-axis`? – Ed_Gravy Dec 14 '22 at 17:03
1

@Ed_Gravy Yes, just a moment, will edit with the actual values. Done. – Rui Barradas Dec 14 '22 at 17:06
Cool thanks now how can I label (or something similar) to also show the total count (n) of each `FF` in the plot as well? – Ed_Gravy Dec 14 '22 at 17:09
1

@Ed_Gravy Just add `geom_text(aes(label = FF), vjust=-1)` at the end, after `geom_col`. – Rui Barradas Dec 14 '22 at 17:17
Awesome, one last question, how can I also add the `FF` i.e. `1`, `2` and `3` as a label for each respective bar? – Ed_Gravy Dec 14 '22 at 17:19
1

@Ed_Gravy With `geom_text` or `geom_label`. But with `vjust = 0.5` to put them in the middle of the bars. The main difference is that `geom_label` puts the labels in a (white) rectangle. – Rui Barradas Dec 14 '22 at 17:22
But, now `FF = n()` (so what variable will I put in `geom_text()`?) so, shouldn't `count` be in a column say `N` and leave `FF` as it is. This way both `count` and `FF` labels can be added to the plot, right? – Ed_Gravy Dec 14 '22 at 17:29
1

@Ed_Gravy Done, see now, edit 2. I bet that there's simpler but it works. – Rui Barradas Dec 14 '22 at 18:26

Bar plot by group in descending order of categorical variable count within each group

2 Answers2

Edit

Edit 2