-2

Below is an example of my dataset:

structure(list(wheezing_InDMod = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0), cough_anyMod = c(0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0), SOB_anyMod = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), country.x = c("cameroon", 
"cameroon", "cameroon", "kenya", "cameroon", "ghana", "cameroon", 
"kenya", "cameroon", "kenya", "cameroon", "cameroon", "cameroon", 
"cameroon", "cameroon", "cameroon", "cameroon", "cameroon", "ghana", 
"cameroon", "kenya", "cameroon", "ghana", "cameroon", "cameroon", 
"cameroon")), row.names = 65:90, class = "data.frame")

For wheezing_InDMod, SOB_anyMod & cough_anyMod, 1 indicates that this individual has the symptom and 0 indicates they do not

I'm trying to plot a single bargraph showing the prevalence of each symptom [ e.g. "1" for wheezing_InDMod, SOB_anyMod & cough_anyMod] on the X axis, with each further split into 3 adjacent bars based on the country.x category. I'll attach an image below to show an idea of what I mean:

Does anyone know how I would go about creating this using ggplot? I've tried a few different codes and I haven't got very far

ideal graph output

  • 1
    To work well with ggplot you need to pivot your data to a long format so that you have a single column `Mod` with values `"Cough", "SOB", "Wheeze"`, and then you can tell ggplot `aes(x = Mod, fill = country.x)`. [Here the FAQ for that](https://stackoverflow.com/q/2185252/903061). Something like `tidyr::pivot_longer(your_data, contains("Mod"), names_to = "Mod")` should to it. – Gregor Thomas Aug 02 '22 at 15:09
  • Thanks for the reply. The issue is that each response can have multiple symptoms (e.g. a response can have 1 for Cough, SOB and Wheeze, they aren't mutually exclsuive), so I don't think I can put that data into a single column – user18772311 Aug 02 '22 at 15:27

1 Answers1

1

The single column thing isn't an issue. Here's an example using the code from my comment. I'd be curious to see the code that you tried that made you think this was an issue.

library(dplyr)
library(tidyr)
library(ggplot2)

your_data %>%
  tidyr::pivot_longer(contains("Mod"), names_to = "Mod") %>%
  ## keep only 1s
  filter(value == 1) %>% 
  ## clean up the names
  mutate(Mod = stringr::str_remove(Mod, "_.*")) %>%
  ggplot(aes(x = Mod, fill = country.x)) +
  geom_bar(position = position_dodge(preserve = "single"))

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you, this has sorted it. The example data was part of a larger, more complex data frame which wouldn't fit to a long format but I've subsetted the data needed for this graph and tried the code you listed above which has worked. Appreciate your help. – user18772311 Aug 02 '22 at 16:02