0

I have an ordinal variable with the following categories

very favorable (1) somewhat favorable (2) somewhat unfavorable (3) very unfavorable (4) don't know (8) refuse to answer (9)

I want my output binary variable to display:

favorable (1) unfavorable (0)

I want to do that by grouping together "very favorable" and "somewhat favorable" to the new "favorable" outcome coded in "1" and also group together "very unfavorable" and "somewhat favorable" to new outcome "unfavorable coded as "0".

So basically I want to turn "1" = "1" and "2" = "1" then "3" = "0" and "4" = "0"

  • 1
    Hi there and welcome to SO. Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a sample input (not posted in an image, consider using `dput()`) and your expected output. So we can understand your problem and think about a possible solution and verify it compared to your expected output. – Martin Gal Apr 18 '22 at 20:59
  • Hiii! Thank you for getting back to me so quickly. I edited the question so I hope it's clearer now what I want to do – Nouran Samer Apr 18 '22 at 21:07
  • Could you share a sample of your data used? `dput(head(YourData))` is quite useful here. – Martin Gal Apr 18 '22 at 21:27

2 Answers2

0

Lots of ways to do this, easiest way I can think of is making use of some %in%.

e.g, in base R:

data$column_to_recode = as.character(data$column_to_recode) #failing to do this may result in R coercing existing factors to numeric integers representing ranks
data$column_to_recode[which(data$column_to_recode %in% c(1,2))] = 1
data$column_to_recode[which(data$column_to_recode %in% c(3,4))] = 0
data$column_to_recode[which(!(data$column_to_recode %in% c(0:4)))] = NA #or whatever else you want to do with the values that aren't 1 through 4`

Then if you really want bonus points you could coerce this back into a factor variable, but I find this is usually excessive.

data$column_to_recode = factor(data$column_to_recode,levels=c(0,1),ordered = TRUE)

I couldn't tell from your original question if the numeric codes were fine or if you wanted to use character strings instead, but the same logic applies, e.g:

data$column_to_recode[which(data$column_to_recode %in% c("(1) somewhat favorable","(2) somewhat unfavorable"))] = "Favorable"

should get you what you need.

  • That was extremely helpful! Thank you so much! – Nouran Samer Apr 18 '22 at 21:52
  • Numeric was fine and I just tried using it however for some reason I only seem to find either "1" or "NA" but I can't find "0" in the categories. Any idea why that might have happened? \ – Nouran Samer Apr 18 '22 at 21:53
  • This is what I used. new_df$Q11c = as.character(new_df$Q11c) new_df$Q11c[which(new_df$Q11c %in% c(1,2))] = 1 new_df$Q11c[which(new_df$Q11c %in% c(3,4))] = 0 new_df$Q11c[which(!(new_df$Q11c %in% c(1:4)))] = NA new_df$Q11c = factor(new_df$Q11c,levels=c(0,1),ordered = TRUE) – Nouran Samer Apr 18 '22 at 21:54
  • Ah, sorry, misunderstanding in my answer just edited to reflect it. The line data$column_to_recode[which(!(data$column_to_recode %in% c(1:4)))] = NA #or whatever else you want to do with the values that aren't 1 through 4` will have recoded all 0s to NAs, so after you made all 1 and 0 values, it turns the 0s into NAs. Changing the in to 0:4 will fix this. – Andrew Taylor May 05 '22 at 14:34
0

Here's a dplyr solution with case_when() which is really useful for creating dummies.

My starting data is as follows:

  # A tibble: 6 x 2
      participant category              
            <int> <chr>                 
    1           1 somewhat favorable (2)
    2           2 very unfavorable (4)  
    3           3 very favorable (1)    
    4           4 don't know (8)        
    5           5 very favorable (1)    
    6           6 somewhat favorable (2)

So, basically, when it detects 1 or 2, it will convert the row value into "favorable (1)" and 3 or 4 into "unfavorable (0)"

data %>%  
  mutate(category = case_when(
    str_detect(category, "(1)|(2)") ~ "favorable (1)", 
    str_detect(category, "(3)|(4)") ~ "unfavorable (0)"))

Since (8) and (9) s not specified, the code returns them as NAs. Final dataset is as follows:

# A tibble: 10 x 2
   participant category       
         <int> <chr>          
 1           1 favorable (1)  
 2           2 unfavorable (0)
 3           3 favorable (1)  
 4           4 NA             
 5           5 favorable (1)  
 6           6 favorable (1)  
 7           7 unfavorable (0)
 8           8 unfavorable (0)
 9           9 favorable (1)  
10          10 unfavorable (0)
Chamkrai
  • 5,912
  • 1
  • 4
  • 14