0

I am working on a dataset of 35 variables. I have derived age dummy variable categories to classify age of patients into different age groups. Now I want to aggregate the total no of cases and the number of cases in each age group based on date and location variables. Following is the code I have tried however I am not getting the sum of values of cases in each age group. For example if there are total 10 cases those ten cases should be divided into different age groups but NAs are appearing. In some cases 1 or 2 no of cases are appearing in few age groups which is not representative of total cases.

df_sa2 <- aggregate( cbind(cases=df_sa1$cases, agecat1=df_sa1$agecat1, agecat2=df_sa1$agecat2, agecat3=df_sa1$agecat3, agecat4=df_sa1$agecat4, agecat5=df_sa1$agecat5), by = list(Date=df_sa1$date, location=df_sa1$location), FUN = sum)

I have checked the datatypes they are all numeric.

Please suggest what is wrong with the code. Thank you.

starja
  • 9,887
  • 1
  • 13
  • 28
  • 2
    It would be helpful if you could provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), as it's a little hard to understand exactly what it is you're hoping to achieve. Thanks. – A.Fowler Oct 08 '20 at 20:06

1 Answers1

0

Consider the formula style of aggregate which can read better and use the data argument to avoid the numerous df_sa1$ qualifiers.

With formula style, numeric columns are placed to the left of ~ and categorical variables to the right for grouping columns. Doing so also renders cbind and list unnecessary.

fml <- cases ~ date + location + agecat1 + agecat2 + agecat3 + agecat4 + agecat5

df_sa2 <- aggregate(fml, data=df_sa1, FUN=sum)

# TO ACCOUNT FOR POTENTIAL MISSING VALUES IN df_sa1$cases
df_sa2 <- aggregate(fml, data=df_sa1, FUN=function(x) sum(x, na.rm=TRUE), na.action=na.pass) 

If you need individual age category groupings, adjust formula accordingly:

fml <- cases ~ date + location + agecat1
fml <- cases ~ date + location + agecat2
...
fml <- cases ~ date + location + agecat5
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you for your valuable suggestions, however the code is not working (error in FUN, error in eval(predvar, data, env). Also, I want to keep agecat1, 2,3,4 5 on the left hand side of the formula. i.e. I want to categorize all cases in different age categories. – Hira Fatima Oct 10 '20 at 18:30
  • I want something like this at the end: – Hira Fatima Oct 10 '20 at 18:36
  • I took a gamble at answering without data. Please see above comment on setting up a reproducible example. And please edit your question and do not post code or data in comments. – Parfait Oct 10 '20 at 21:38