2

I'm a beginner at R and am having trouble making a plot that shows different responses (male vs female) as to how they rate their health status("Poor", "Fair", "Good", "Very Good", "Excellent"). The problem is that there are more females than male respondents, so I made a little function to try to mutate the y-axis from count to percentages. Can someone please help? Here is the code:

brfss2013 %>% 
filter(!is.na(sex))%>%
count(sex) %>% 
mutate(perc = n / nrow(brfss2013)) -> brfss2

brfss2013%>%
filter(!is.na(sex))%>% 
filter(!is.na(genhlth))%>% 
group_by(sex, genhlth)%>%  
ggplot(brfss2013, mapping = aes(x = genhlth) + geom_bar(aes(fill = brfss2$sex),position = 
"dodge") + scale_fill_brewer("Gender") + labs(title = "Reported generalhealth - by gender", x = 
"general health - reported")
  • 1
    Hi - can you please post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Its extremely difficult for us to help you without being able to run your code – Conor Neilson Mar 21 '20 at 19:02
  • Hi Conor, So, I still can't make one, as I am really, really new - 4 weeks ago I couldn't even run simple commands in RStudio. The data has 330 columns and 490,000 lines, so I don't know how representative a random sample would really be, anyway. – Tania Pescarini Mar 21 '20 at 20:01
  • Hi Conor, So, I still can't make one, as I am really, really new - 4 weeks ago I couldn't even run simple commands in RStudio. The data has 330 columns and 490,000 lines, so I don't know how representative a random sample would really be, anyway. Just one question: the first function brfss2$perc [1] 0.4093600 0.5906258, we see that 40.93% of respondents are male and 59.06% are female. So is it possible to create a line of code to sample randomly only 69.3% of females, so to roughly equal female and male respondents? Would it be a valid solution? – Tania Pescarini Mar 21 '20 at 20:14

1 Answers1

1

@Tania - welcome to SO!

It is not entirely clear what your final desired plot should look like. But here is one potential way to pursue this. This example is based on BRFSS 2018 data.

First, it appears you would like to remove rows with NA.

Second, you can filter on those that have the expected SEX1 and GENHLTH values, to exclude those that answered "not sure" or "refused."

Next, you can group_by both columns, and compute the percentages. Note the order in your group_by matters, it will give different results. By the grouping here, I wanted % to add up to 100 across GENHLTH (for each sex).

Finally, you can plot using the percentage as the vertical axis.

library(tidyverse)

BRFSS_b %>%
  drop_na() %>%
  filter(SEX1 == 1 | SEX1 == 2,
         GENHLTH >= 1 & GENHLTH <=5) %>%
  group_by(SEX1, GENHLTH) %>%
  summarise(n = n()) %>%
  mutate(perc = n*100 / sum(n)) %>%
  ggplot(mapping = aes(x = factor(GENHLTH), y = perc, fill = factor(SEX1))) + 
           geom_bar(stat = "identity", position = position_dodge()) + 
           scale_fill_brewer("Gender") + 
           labs(title = "Reported generalhealth - by gender", x = "general health - reported", y = "percent")

Output

plot of percentage vs gen health by sex

The data generated for ggplot looks like this:

# A tibble: 10 x 4
# Groups:   SEX1 [2]
    SEX1 GENHLTH     n  perc
   <dbl>   <dbl> <int> <dbl>
 1     1       1 33272 16.9 
 2     1       2 63670 32.3 
 3     1       3 63411 32.2 
 4     1       4 26554 13.5 
 5     1       5  9962  5.06
 6     2       1 38454 16.1 
 7     2       2 78260 32.8 
 8     2       3 74531 31.3 
 9     2       4 34053 14.3 
10     2       5 13057  5.48

Edit 3/23/20:

If you want to plot "counts" instead of percent, you can do the following for ggplot. You probably need to add to geom_bar stat="identity" and make sure your variables are factors (if not already converted).

ggplot(mapping = aes(x = factor(GENHLTH), y = factor(n))) + 
  geom_bar(stat = "identity", aes(fill = factor(SEX1)), position = "dodge") + 
  scale_fill_brewer("Gender") + 
  labs(title = "General health by gender", x = "reported general health")
Ben
  • 28,684
  • 5
  • 23
  • 45
  • your solution is great and the data preparation worked perfectly for me. However, I couldn't run it as a one-piece code, so created a new function - brfss3 (brfss3 <- brfss2013%>% filter(!is.na(sex))%>% filter(!is.na(genhlth))%>% group_by(sex, genhlth)%>%summarise(n = n())%>%mutate(perc = n*100 / sum(n)). – Tania Pescarini Mar 23 '20 at 10:04
  • this gives the right percentages for each gender answer. But when I try to run in ggplot (ggplot(brfss3, mapping = aes(x = genhlth, y = n)) + geom_bar(aes(fill = (sex)), position = "dodge") + scale_fill_brewer("Gender") + labs(title = "General health by gender", x = "reported general health"), it returns the following error: Erro: stat_count() can only have an x or y aesthetic – Tania Pescarini Mar 23 '20 at 10:06
  • tried ggplot(data = brfss3, mapping = aes(x = genhlth, y = n )) + geom_bar(fill(brfss2013$sex), position = "dodge", stat = "identity") + scale_fill_brewer("Gender") + labs(title = "Reported generalhealth - by gender", x = "general health - reported", y = "percentage") and got the folllowing error: Error in UseMethod("fill_") : method not applied to 'fill_' aplied to class "factor" – Tania Pescarini Mar 23 '20 at 22:50
  • which makes no sense, because sex is of class factor – Tania Pescarini Mar 23 '20 at 22:58
  • ben, now I finally made it. I tabled all the data and then got the maps right. Thank you so much! – Tania Pescarini Mar 24 '20 at 13:43