0

I am a public health student,and learning the R programming for health research and epidemiology.I am currently working on a data set shown below: data set

What I am trying to do here is to convert the age group in to age category, for example (0-20),(21-40),(41-60), and so on. And I want to see the mean and standard deviation for specific disease type,eg-NKEP for each age category. For converting age into age category i am following this code:

library(dplyr)
practice$agegroup= cut(practice$age, breaks = c(0,20,40,60,80,100),labels = c("0-20","21-40","41-60","61-80","81-100"),right = TRUE)

and to see age category wise disease type i tried this:

practice %>% group_by(agegroup) %>% count(agegroup,dtype)

by doing so I am getting this output:

    # A tibble: 22 x 3
# Groups:   agegroup [5]
  agegroup dtype       n
  <fct>    <chr>   <int>

 1 0-20     KATF        6
 2 0-20     NKA       427
 3 0-20     PKDL      264
 4 0-20     RELAPSE    44
 5 21-40    CL          5
 6 21-40    KATF        2
 7 21-40    NKA       440
 8 21-40    PKDL      285
 9 21-40    RELAPSE   106
10 41-60    CL          2
 # ... with 12 more rows''

Now is it possible to make 2 new variable category for each row that will show mean and standard deviation for each disease type and age category ? I am trying to run base command as well as using tidyverse,dplyr and doby package ,but every-time I am including the agecatagory for SD and MEAN ,it shows error like-"the variable have to be numeric",or "the variables are not of same length" etc.

How can I solve this problem? Are the newly created "agecatagory" somehow not properly converted? How can i get mean and standard deviation for each category? I desperately need your help and suggestion. -Thanks

  • 3
    mean and standard deviation of what variable? [Related](https://stackoverflow.com/questions/9847054/how-to-get-summary-statistics-by-group). – Rui Barradas Feb 03 '20 at 15:59
  • 1
    this should be pretty simple `practice %>% group_by(agegroup, dtype) %>% mutate(mean = mean(variable), sd=sd(variable)` but the variable have to be numeric – jyr Feb 03 '20 at 16:07
  • Hello, and thanks for your reply.@Rui Barradas, mean and standard deviation of variables named "dtype", under this column there are disease category such as-KATF,PKDL,NKA which are not numeric,but when i make a frequency table using age category and dtype ,then i can find frequency of disease category in various age group,which is numeric(n)(the frequency table I posted above).Then I want to calculate the mean and standard deviation for each age group.I am trying to calculate the mean and standard deviation of grouped data.exactly like this-http://mathforum.org/library/drmath/view/52199.html – Asif Zaman Khan Feb 03 '20 at 19:35
  • @jyr ,thanks for your help.I have tried your suggestion.But the mean is coming different for different age group. Wouldn't be the mean will be same for grouped data.? I had filtered data for each disease type and then done this-[practice %>% group_by(agecat, dtype) %>% mutate(mean = mean(age), sd=sd(age))], but the mean is coming different for each age group. I am really sorry if i failed to make it clear. Here is a video of what I am trying to do-https://www.youtube.com/watch?v=x6Pnf57wRAs – Asif Zaman Khan Feb 03 '20 at 19:55
  • You dont have to group the data before calculating it then, you can just use `mean(practice$age)` and `sd(practice$age)` – jyr Feb 03 '20 at 21:01

0 Answers0