I am a public health student,and learning the R programming for health research and epidemiology.I am currently working on a data set shown below: data set
What I am trying to do here is to convert the age group in to age category, for example (0-20),(21-40),(41-60), and so on. And I want to see the mean and standard deviation for specific disease type,eg-NKEP for each age category. For converting age into age category i am following this code:
library(dplyr)
practice$agegroup= cut(practice$age, breaks = c(0,20,40,60,80,100),labels = c("0-20","21-40","41-60","61-80","81-100"),right = TRUE)
and to see age category wise disease type i tried this:
practice %>% group_by(agegroup) %>% count(agegroup,dtype)
by doing so I am getting this output:
# A tibble: 22 x 3
# Groups: agegroup [5]
agegroup dtype n
<fct> <chr> <int>
1 0-20 KATF 6
2 0-20 NKA 427
3 0-20 PKDL 264
4 0-20 RELAPSE 44
5 21-40 CL 5
6 21-40 KATF 2
7 21-40 NKA 440
8 21-40 PKDL 285
9 21-40 RELAPSE 106
10 41-60 CL 2
# ... with 12 more rows''
Now is it possible to make 2 new variable category for each row that will show mean and standard deviation for each disease type and age category ? I am trying to run base command as well as using tidyverse,dplyr and doby package ,but every-time I am including the agecatagory for SD and MEAN ,it shows error like-"the variable have to be numeric",or "the variables are not of same length" etc.
How can I solve this problem? Are the newly created "agecatagory" somehow not properly converted? How can i get mean and standard deviation for each category? I desperately need your help and suggestion. -Thanks