I want to calculate the mean and standard deviation contacts for twenty types of hospital services in two arms of a trial. I have done this so far by using group_by(arm, service)
. This gives the average of the people who use that service in that arm. What my boss wants instead is the average of each service, divided by everyone in that arm.
So, if there are 100 cardiology contacts, 30 patients in each arm, but 10 attend a cardiology appointment, the calculation should be 100/30 rather than 100/10. The only way I can think about doing it is splitting the arms out into separate datasets and then I would only need to group by service, which solves the problem.
An example of what this looks like:
rep_prob <- tibble(id = 1:6, arm = c(1,1,1,0,0,0), service = c(1,1,2,1,2,2), contacts = c(21,3,14, 2,5,10)) %>%
group_by(arm, service) %>%
summarise(mean = mean(contacts), sd = sd(contacts))
Which gives results that look like this:
arm service mean sd
0 1 2.0 NaN
0 2 7.5 3.535534
1 1 12.0 12.727922
1 2 14.0 NaN
Where instead I want the option to give the mean and SD of each service compared to the arm as a whole, not as the subgroup of arm and service.
This is apparently very easy in Stata and I am the only person in my department who uses R. For all my other results tables I am only slicing my table by one variable and so using group_by(arm) and then summarising works.