0

I'm trying to create one variable out of multiple others. So I first created factor variables for each of the different variables as shown below:

usl <- mutate(usl, unsafenat1_fct = case_when(unsafenat1 == 0 ~ "Not mentioned", unsafenat1 == 1 ~ "Yes mentioned"), unsafenat1_fct = as.factor(unsafenat1_fct))

usl <- mutate(usl, unsafenat2_fct = case_when(unsafenat2 == 0 ~ "Not mentioned", unsafenat2 == 1 ~ "Yes mentioned"), unsafenat2_fct = as.factor(unsafenat2_fct))

And then out of these (and multiple others) I want to create a variable as shown below:

usl <- mutate(usl, unsafenat = ifelse(unsafenat1_fct == "Yes mentioned" | unsafenat2_fct == "Yes mentioned" | unsafenat3_fct == "Yes mentioned" | unsafenat4_fct == "Yes mentioned" | unsafenat5_fct == "Yes mentioned" | unsafenat6_fct == "Yes mentioned" | unsafenat7_fct == "Yes mentioned" | unsafenat8_fct == "Yes mentioned" | unsafenat9_fct == "Yes mentioned" | unsafenat10_fct == "Yes mentioned" | unsafenat11_fct == "Yes mentioned" | unsafenat12_fct == "Yes mentioned"| unsafenat97_fct == "Yes mentioned", "Yes mentioned", "Not mentioned"), unsafenat = as.factor(unsafenat))

Basically I want that if the outcome in any of the initial variables was "Yes mentioned" then I want the outcome in my new variable to also be "Yes mentioned" but if it isn't "Yes mentioned" in none of them then I want it to be "Not mentioned". However, when I do this code it only recognizes the "Yes mentioned" and all the others are added to the NA group and I don't know why.

Here's a look at the variables:

enter image description here

When I tried a different coding, it become clear that the problem is that it doesn't recognize "Not mentioned" as a factor but I don't understand why.

usl$unsafeethn[usl$unsafeethn1_fct == "Not mentioned"]<-"Not mentioned"

Warning message: In [<-.factor(*tmp*, usl$unsafeethnn1_fct == "Not mentioned", : invalid factor level, NA generated

Nora17.06
  • 25
  • 3
  • Please can you post a reproducible example with sample data (e.g., by using `dput(your_data_frame)`. – coffeinjunky Aug 05 '21 at 19:08
  • The likely issue is that factors have levels and labels, which are not fully specified when you create them. It may be useful to specify the different labels upfront when you define the variables as factors so that the other labels get recognized later. – coffeinjunky Aug 05 '21 at 19:10
  • Obviously, you could also avoid using factors alltogether. – coffeinjunky Aug 05 '21 at 19:12
  • Hi, I'm sorry I don't know how to use dput. When I check for levels, both seems to be specified though as it recognizes "Not mentioned" and "Yes mentioned" so I'm not sure what to do about it. How could I do this without using factors then? – Nora17.06 Aug 05 '21 at 19:18
  • You use `dput` by entering `dput(usl)` into your console and add the output to your post. – coffeinjunky Aug 05 '21 at 19:48
  • Look here for examples: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – coffeinjunky Aug 05 '21 at 19:49

1 Answers1

0

You don't need to create different _fct variables for each column. If you have 1/0 values in the 'unsafenat' column you can sum them and assign "Yes mentioned" only when there is at least one 1 in the row.

library(dplyr)

usl <- usl %>%
  mutate(unsafenat = ifelse(rowSums(select(., starts_with('unsafenat')), 
                            na.rm = TRUE) > 0, "Yes mentioned", "Not mentioned"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213