0

I am trying to calculate the percentages for cigarettes smoking status by sex (for example, the % of males/females who are Non-smokers, Occasional smokers, Prefer not to say, Regular smokers etc). The default seems to calculate the percentage from the Row Total and not the Column Total. Any help would be greatly appreciated.

Dataframe

structure(list(sex = c("Female", "Male", "Female", "Female"), 
    cigarettes_smoking_status = c("Non-smoker", "Non-smoker", 
    "Non-smoker", "Non-smoker")), row.names = c(NA, 4L), class = "data.frame")

Code

smoking_status_by_sex <- smoking_data %>% 
  group_by(sex) %>% 
  dplyr::count(cigarettes_smoking_status) %>% 
  pivot_wider(names_from = sex, values_from = n) %>% #increase number of columns & reduce rows
  adorn_totals(c("row", "col") )

smoking_status_by_sex_per <- smoking_status_by_sex %>% 
   mutate(female_pct = round((100*.[[2]]/Total),digits =2),
          male_pct = round((100*.[[3]]/Total),digits =2),
          prefer_not_to_say_pct = round((100*.[[4]]/Total), digits=2),
          unknown_pct = round((100*.[[5]]/Total),digits =2),
          total_pct = round((100*.[[6]]/Total), digits=2))

This is the table I am trying to replicate below [What I am trying to replicate][1] [1]: https://i.stack.imgur.com/hhDA4.png

I have tried using count, colSum, adorn_totals etc and then tried to use pivot_wider. Any help would be greatly appreciated.

  • I suggested a fix to the formatting of your question. Note that code is formatted in one of two ways: single backticks are used for inline, such as "`hello \`world\``" producing "hello `world`"; triple backticks are for full code blocks, and must be triple backticks on their own line, then the code, then the triple backticks again on their own line, not prepending every line of code. See https://stackoverflow.com/editing-help and https://meta.stackexchange.com/a/22189 (and my edit) for good examples of this. – r2evans Dec 17 '22 at 12:06
  • Could you please add the data to your question (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? I guess the easiest way is to start out with `janitor::tabyl()` (or alternatively `gttable`). – harre Dec 17 '22 at 15:22

1 Answers1

0

Its easier to group_by sex and smoking status and then compute the relative frequencies. An example is given below.

library(tidyverse)
df<-starwars
df %>% 
  group_by(eye_color,skin_color) %>%  ##grouping by eyecolor and skin color!
  summarise(count1=n()) %>% 
  mutate(grouppercentage=(count1/sum(count1))*100)


chris jude
  • 467
  • 3
  • 8
  • Thank you Chris that really helps with the percentages. Do you know how I could use pivot_wider to collate the percentages into a table like the one at the bottom of my first post? I am having difficulty using it to have just the 5 rows(Non-smokers, Regular smokers, Prefer not to say, Regular smoker etc). When I try it It has 4 rows for non-smokers. 4 for occasional smokers etc. Really appreciate everyone's time. – Matthew Sutherland Dec 17 '22 at 15:18
  • can u share the data using dput(df)? – chris jude Dec 17 '22 at 16:45