-1

I have a dataset with a lot of variables that need to have its values labelled. I know how to add labels to their values one by one but I would like to incorporate a loop, which could automatically assign a label to a value of 1 (1 indicates that someone selected an option, for instance they have depression, while 0 means that they didn't select it) across several variables that are dsm_00 (where 1 is supposed to be labelled as "no diagnosis"), dsm_01 (1 is depression), dsm_02 (1 is anxiety) and so on up until dsm_34.

I have created a list of names to be assigned:

labels <- list("no diagnosis", "depression", "anxiety", "bipolar", ....).

And I have a code for how to do it one by one:

val_lab(mydat$dsm_00) = num_lab(" 
             1 no diagnosis
")

I'm not sure how I would incorporate it as a loop (I have always struggled with those). Any help would be appreciated!

  • post a [reproducible example.](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Eric May 04 '22 at 12:37

1 Answers1

0

In this situation, you probably don't want to use a loop. An easier approach is to write a function that produces the desired label from a given value, and apply it to the whole column. A convenient way to do this is the mutate() function in the dplyr package. Here's an example:

labels <- list("no diagnosis", "depression", "anxiety", "bipolar")

# This is the function to contain your code for assigning labels
# based on values in your data set. Replace this with whatever
# logic you have. In this example, I've assumed that the values
# we are labeling are all integers we could use to look up labels.
get.label = Vectorize(
  function(diagnosis.code) {
    labels[[diagnosis.code]]
  })

# This package gives you mutate() and %>%
library(dplyr)

# Example data.
data = data.frame(diagnosis.codes = c(1, 3, 2, 2, 1))

# Create a new column "label" by applying your function to the
# values in another column.
data = data %>% mutate(label = get.label(diagnosis.codes))

Now if you look at your data frame, you should get the following

> data
#   response.codes        label
# 1              1 no diagnosis
# 2              3      anxiety
# 3              2   depression
# 4              2   depression
# 5              1 no diagnosis