I'm trying to get better with dplyr and tidyr but I'm not used to "thinking in R". An example may be best. The table I've generated from my data in sql looks like this:
╔═══════════╦════════════╦═════╦════════╦══════════════╦══════════╦══════════════╗ ║ patientid ║ had_stroke ║ age ║ gender ║ hypertension ║ diabetes ║ estrogen HRT ║ ╠═══════════╬════════════╬═════╬════════╬══════════════╬══════════╬══════════════╣ ║ 934988 ║ 1 ║ 65 ║ M ║ 1 ║ 1 ║ 0 ║ ║ 94044 ║ 0 ║ 69 ║ F ║ 1 ║ 0 ║ 0 ║ ║ 689348 ║ 0 ║ 56 ║ F ║ 0 ║ 1 ║ 1 ║ ║ 902498 ║ 1 ║ 45 ║ M ║ 0 ║ 0 ║ 1 ║ ║ … ║ ║ ║ ║ ║ ║ ║ ╚═══════════╩════════════╩═════╩════════╩══════════════╩══════════╩══════════════╝
I would like to create an output table that conveys the following information:
╔══════════════╦════════╦══════════╦══════════╦══════════╦═══════════╗ ║ ║ total ║M lt50 yo ║F lt50 yo ║M gte50yo ║F gte 50yo ║ ╠══════════════╬════════╬══════════╬══════════╬══════════╬═══════════╣ ║ estrogen HRT ║ 347 ║ 2 ║ 65 ║ 4 ║ 97 ║ ║ diabetes ║ 13922 ║ 54 ║ 73 ║ 192 ║ 247 ║ ║ hypertension ║ 8210 ║ 102 ║ 187 ║ 443 ║ 574 ║ ╚══════════════╩════════╩══════════╩══════════╩══════════╩═══════════╝
Total is the total number of patients with that comorbidity (easy enough: sum(data$estrogen == 1) etc). The other cells are now the number of patients with that comorbidity in that age and gender stratification where had_stroke==1.
I'd love to just get a general idea of how to approach problems like this as it seems like a pretty fundamental way to transform data. If the total column makes it funky then feel free to exclude that.