I need to do a frequency table from two categorical variable columns where one is a 5-year age group and the other is health status (five states) from the brfss2013 data set, from where I extracted the columns of interest via:
> hlthgrpq1 <- brfss2013 %>% select(genhlth, X_ageg5yr)
Thus generating a two column frame, 491775 observations of 2 variables.
genhlth X_ageg5yr
1 Fair Age 60 to 64
2 Good Age 50 to 54
3 Good Age 55 to 59
4 Very good Age 60 to 64
5 Good Age 65 to 69
I can generate a summary table with the 'by' function:
> by(hlthgrpq1$genhlth, hlthgrpq1$X_ageg5yr, summary)
hlthgrpq1$X_ageg5yr: Age 18 to 24
Excellent Very good Good Fair Poor NA's
6896 10266 7795 1873 303 69
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 25 to 29
Excellent Very good Good Fair Poor NA's
5779 8488 6521 1751 325 46
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 30 to 34
Excellent Very good Good Fair Poor NA's
6412 9958 7977 2295 496 75
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 35 to 39
Excellent Very good Good Fair Poor NA's
6366 10169 8236 2637 638 61
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 40 to 44
Excellent Very good Good Fair Poor NA's
6689 11130 9193 3334 1067 95
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 45 to 49
Excellent Very good Good Fair Poor NA's
7051 12278 10611 4343 1815 112
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 50 to 54
Excellent Very good Good Fair Poor NA's
8545 15254 13761 6354 3120 139
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 55 to 59
Excellent Very good Good Fair Poor NA's
8500 16759 15394 7643 3998 197
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 60 to 64
Excellent Very good Good Fair Poor NA's
8283 16825 16266 8101 3955 229
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 65 to 69
Excellent Very good Good Fair Poor NA's
7479 15764 15600 7749 3200 205
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 70 to 74
Excellent Very good Good Fair Poor NA's
5491 11943 13125 6491 2721 196
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 75 to 79
Excellent Very good Good Fair Poor NA's
3320 8501 10128 5545 2426 173
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 80 or older
Excellent Very good Good Fair Poor NA's
3697 10285 14400 8116 3695 322
And that's where I get stuck. I have tried for hours to attempt to get here:
Results obtained via spreadsheet.
Thanks for any help.
(This is for a specific assignment so I can only use dplyr and ggplot2, so, no reshape2 or tidyr.)