0

I am having some difficulty creating nested crosstabs and getting the correct calculations as the value in tables. What I am looking to create is:

 Race         resCallCount    Completion Rate

Caucasian        1                0.53%

Caucasian        2                0.48%

Caucasian        3                0.32%

Caucasian        4                0.16%

Caucasian        5                0.07%

Caucasian        6                0.00%

Where completion rate is computed as: percent = complete/sum(n))

n is calculated from add_count and has each case marked as 1

I've been trying

CellAttempts <- subset(combined2, CELL == 1)

CellAttempts <- add_count(CellAttempts, ID)

group_by(CellAttempts, RACE) %>% transmute(resCallCount, percent = 
complete/sum(n))`

But only get

Groups:   RACE [13]
   RACE      resCallCount percent
   <chr>            <int>   <dbl>
 1 Caucasian            1      NA
 2 Caucasian            1      NA
 3 Caucasian            1      NA
 4 Caucasian            1      NA
 5 Caucasian            1      NA
 6 Caucasian            1      NA
 7 Caucasian            1      NA
 8 Caucasian            1      NA
 9 Caucasian            1      NA
10 Caucasian            1      NA
 ... with 520,337 more rows

Any help is appreciated

EDIT: here is what the initial data frame looks like:

my data is stacked by individual with multiple rows for each.

  ID             resCallCount resCodeResult   AGE RACE      complete     n
  <chr>                 <int> <chr>         <int> <chr>        <dbl> <int>
1 NY2252a_45493             1 P1               62 Caucasian       1     1
2 NY2252a_45494             1 P1               50 Caucasian       NA     1
3 NY2252a_454911            1 P1               31 Caucasian       NA     1
4 NY2252a_454917            1 12               57 Caucasian       1     1
5 NY2252a_454919            1 P1               80 Caucasian       1     1
6 NY2252a_454928            1 P1               30 Caucasian       1     1
Trish
  • 1
  • 1

1 Answers1

0

If I understand correctly, I think you're eliding the grouping. I'm guessing you want to summarise over all IDs for a given resCallCount and RACE. This is not explicitly noted in your example; ID is omitted from transmute so you'll still get a row for every unique ID, which I don't think is intentional.

I would propose something like this; the core thing here being that we're summing over complete with na.rm=TRUE to sum. So

CellAttempts %>%
  group_by(RACE, resCallCount) %>%
  summarise(percent = sum(complete, na.rm=TRUE)/sum(n))

Which yields for the example data:

# A tibble: 1 x 3
# Groups:   RACE [?]
  RACE      resCallCount percent
  <chr>            <int>   <dbl>
1 Caucasian            1   0.667
user295691
  • 7,108
  • 1
  • 26
  • 35