I have a question similar to this one.
I want to convert various dummy/logical variables into a single categorical variable/factor based on their name in R. My question is different because there can be many groupings of variables that need to be encoded. For example age
and chol_test
in this example. This is just a subset of my data frame. There are additional variables such as diabetes_test
, etc that would also need to be converted, so I can't just do starts_with("condition")
.
I want to encode the lows to be 1, mediums to be 2, and highs to be 3. If all the encoded variables are 0, leave as N/A.
list(low = 1, medium = 2, high = 3)
Basically the data looks like so:
Input
race gender age.low_tm1 age.medium_tm1 age.high_tm1 chol_test.low_tm1 chol_test.high_tm1
<chr> <int> <int> <int> <int> <int> <int>
1 white 0 1 0 0 0 0
2 white 0 1 0 0 0 0
3 white 1 1 0 0 0 0
4 black 1 0 1 0 0 0
5 white 0 0 0 1 0 1
6 black 0 0 1 0 1 0
I want the output to look like so:
Expected Output:
race gender age chol_test
1 white 0 1 n/a
2 white 0 1 n/a
3 white 1 1 n/a
4 black 1 2 n/a
5 white 0 3 3
6 black 0 2 1
How could I do this? I'm looking for a solution that is similar to the ones posted in the question I linked using dplyr if possible. Sorry for any redundancies.
Data
df <- structure(list(race = c("white", "white", "white", "black", "white",
"black"), gender = c(0L, 0L, 1L, 1L, 0L, 0L), age.low_tm1 = c(1L,
1L, 1L, 0L, 0L, 0L), age.medium_tm1 = c(0L, 0L, 0L, 1L, 0L, 1L
), age.high_tm1 = c(0L, 0L, 0L, 0L, 1L, 0L), chol_test.low_tm1 = c(0L,
0L, 0L, 0L, 0L, 1L), chol_test.high_tm1 = c(0L, 0L, 0L, 0L, 1L,
0L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))