0

I'm currently analzing a dataset and I need help with basic data preparation and showing data in a rather complex barplot. The diagram should look similar to the attached image, but ofcourse with different variables.

lets just use a mock data set for illustration:

df <- data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                         district=c("1","1","2","3","2","1","1","2","3","2"),
                         f1=c("1","2","3","1","2","3","1","2","2","3")
)

district = The city has 3 different districts f1 = First question of the survey with 3 different categories

I want to show the percantage for each categorie per district and plot it similar to the plot in the image. First I want do display the overall percantage (for the city), and then per district. In the same plot!

I'm grateful for every help. Thanks alot

I want a plot similar to this one

kehricht
  • 3
  • 1

1 Answers1

0

As a first step this reuquires or at least I would recommend to compute the counts and percentages. After that it's pretty straightforward to create a stacked barchart using ggplot2:

library(dplyr)
library(ggplot2)

df |> 
  count(district, f1) |> 
  mutate(pct = prop.table(n), .by = district) |> 
  ggplot(aes(pct, district, fill = f1)) +
  geom_col()

EDIT One option to add the overall results would be to first "clone" your data and set the district equal to "city" or ... and second to bind it to your original dataset using e.g. dplyr::bind_rows:

df <- df |> 
  mutate(district = "city") |> 
  bind_rows(df)

df |> 
  count(district, f1) |> 
  mutate(pct = prop.table(n), .by = district) |> 
  ggplot(aes(pct, district, fill = f1)) +
  geom_col()

EDIT2 The order of the categories could be set by converting to a factor with the desired order of the levels. In the code below I use forcats::fct_relevel to make city the first category. To bold the city label I use the ggtext package which allows for styling via Markdown, HTML or CSS. To this end I wrap city in ** aka markdown syntax for bold. Moreover we have to change the theme element to element_markdown.

df |>
  count(district, f1) |>
  mutate(pct = prop.table(n), .by = district) |>
  mutate(
    district = if_else(district == "city", "**city**", district),
    district = forcats::fct_relevel(district, "**city**")) |> 
  ggplot(aes(pct, district, fill = f1)) +
  geom_col() +
  theme(axis.text.y = ggtext::element_markdown())

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks, the Plot looks great, but when I try to use your code, I do get an error message:+ count(district, f1) |> + mutate(pct = prop.table(n), .by = district) |> + ggplot(aes(pct, district, fill = f1)) + + geom_col() Error in count(df, district, f1) : object 'district' not found – kehricht Jun 07 '23 at 06:43
  • Most likely a package conflict. Have you loaded the `plyr` package? Try with using `dplyr::count` instead of just `count`. – stefan Jun 07 '23 at 06:46
  • perfect, this worked. Thanks alot. Only thing i want to add to the plot: the overall percentage for the city not just districts. Is it possible to add as the first bar in the plot? – kehricht Jun 07 '23 at 06:51
  • Great. First, as a general rule the `plyr` package is retired and it is recommended to use `dplyr` instead, i.e. drop `plyr` from your code. If for whatever reason it is needed, then load `dplyr` or `tidyverse` before `plyr` to prevent conflicts with `dplyr`. Concerning your second question: See my edit. – stefan Jun 07 '23 at 07:03
  • Thanks; i dropped plyr from my code; i think this caused a lot of problems. Thanks for the fast help! – kehricht Jun 07 '23 at 08:46
  • Sorry to come back to you @stefan, I have managed to create some nice charts, thanks to your code. What I try to do now, is to highlight the "city" label on the y-axis in "bold" (the rest of the labels should remain in "plain"). Best would be, if "city" is the 1st bar and the label in bold. Any idea how to do this? Best regards! – kehricht Jul 25 '23 at 13:57
  • I just added a second edit with the desired changes. – stefan Jul 25 '23 at 15:42