1

Data comes from this RData dataset

Here is the script:

library(dplyr)
library(ggplot2)
load("brfss2013.RData")

test <- brfss2013 %>%
  select(chcscncr,exract11) %>% 
  filter(chcscncr != "NA" , exract11 != "NA") %>% 
  group_by(exract11,chcscncr) %>% 
  summarise(count = n())

Which results in this table:

> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]

                                                  exract11 chcscncr count
                                                    <fctr>   <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287
3                                  Aerobics video or class      Yes   800
4                                  Aerobics video or class       No  7340
5                                              Backpacking      Yes     4
6                                              Backpacking       No    38

I would like to achieve a table that gives the "yes" proportion of each type of sport, something like:

From

Type     Ans Count
Sport A  yes 45
Sport A  no  55
Sport B  yes 34
Sport B  no  66

to:

Type      p(yes)
Sport A   0.45
Sport B   0.34
luisgonzalez
  • 153
  • 11
  • 1
    A small change to http://stackoverflow.com/questions/29549731/dplyr-finding-percentage-in-a-sub-group-using-group-by-and-summarise - something like: `dat %>% group_by(Type) %>% summarise(p_yes = Count[Ans=="yes"] / sum(Count) )` maybe? I'm no dplyr expert, and that seems a bit clunky, so I'll leave it to the geniuses here at SO to suggest something else. – thelatemail Aug 22 '16 at 00:59

1 Answers1

5

prop.table converts totals to proportions (in this case, just x/sum(x) for the values for each group), so for your "From" table:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>%    # `==` doesn't work for NA
    count(exract11, chcscncr) %>%    # equivalent to `group_by(...) %>% summarise(n = n())`
    group_by(exract11) %>%
    mutate(pct = prop.table(n) * 100)    # `* 100` to convert to percent

## Source: local data frame [144 x 4]
## Groups: exract11 [75]
## 
##                                                    exract11 chcscncr     n      pct
##                                                      <fctr>   <fctr> <int>    <dbl>
## 1  Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19  6.20915
## 2  Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287 93.79085
## 3                                   Aerobics video or class      Yes   800  9.82801
## 4                                   Aerobics video or class       No  7340 90.17199
## 5                                               Backpacking      Yes     4  9.52381
## 6                                               Backpacking       No    38 90.47619
## 7                                                 Badminton      Yes     4 10.52632
## 8                                                 Badminton       No    34 89.47368
## 9                                                Basketball      Yes    37  1.64664
## 10                                               Basketball       No  2210 98.35336
## # ... with 134 more rows

For your "to" table, filter to just the "Yes" rows:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>% 
    count(exract11, chcscncr) %>%
    group_by(exract11) %>%
    mutate(p_yes = prop.table(n)) %>%
    filter(chcscncr == "Yes")

## Source: local data frame [69 x 4]
## Groups: exract11 [69]
## 
##                                                                 exract11 chcscncr     n      p_yes
##                                                                   <fctr>   <fctr> <int>      <dbl>
## 1               Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19 0.06209150
## 2                                                Aerobics video or class      Yes   800 0.09828010
## 3                                                            Backpacking      Yes     4 0.09523810
## 4                                                              Badminton      Yes     4 0.10526316
## 5                                                             Basketball      Yes    37 0.01646640
## 6                                             Bicycling machine exercise      Yes   987 0.13708333
## 7                                                              Bicycling      Yes   728 0.08519602
## 8  Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping)      Yes    22 0.11518325
## 9                                                                Bowling      Yes    68 0.09985316
## 10                                                               Boxing       Yes     5 0.01633987
## # ... with 59 more rows

The proportion of "Yes" values is pretty small, as you can see from the first table.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • Great answer, and thanks for the improvements. Naturally, the "yes" values are small because "yes" means "Has a doctor, nurse, or other health professional ever told you that you had skin cancer?". – luisgonzalez Aug 22 '16 at 02:35
  • Ha, data makes a lot more sense when you know what it is. Maybe those numbers aren't so low... – alistaire Aug 22 '16 at 02:51