2

I have a large dataset where 3 variables are encoded on a scale of 9 categories such as "extremely bad" to "extremely good". I need to make these into numbers appropriate for analysis. I have been advised to use as.numeric(), however, this assigns numbers 1-9 randomly to the scale instead of reflecting the original scale order. For example, "fair" should be in the middle placed at number 5 but has been randomly assigned 2.

s_baldur
  • 29,441
  • 4
  • 36
  • 69
tazzad
  • 21
  • 1
  • Presumably, your variables are stored as factors? In which case, have a look at [this](https://stackoverflow.com/a/3418192/1552004) answer. Also, a [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) makes it a lot easier to answer questions. – Dan Aug 20 '20 at 11:36
  • x <- data.frame(col=c("good", "not good")) ifelse(x == "good", 1, "NA") – Dr. Flow Aug 20 '20 at 11:43

2 Answers2

0

You can use a recoding list that holds all the codes and their values. Then, you can apply the columns to this list to get the values:

recode_as = list("bad"=-1,
                 "neutral"=0,
                 "good"=1)

data = data.frame(6:10,
                  "A"=c("good","good","neutral","bad","bad"),
                  "B"=c("bad","good","bad","good","neutral"),
                  "C"=c("good","good","good","good","bad"))

data$A = unlist(recode_as[as.character(data$A)])
data$B = unlist(recode_as[as.character(data$B)])
data$C = unlist(recode_as[as.character(data$C)])

Data before transformation:

  X6.10       A       B    C
1     6    good     bad good
2     7    good    good good
3     8 neutral     bad good
4     9     bad    good good
5    10     bad neutral  bad

Data after transformation:

  X6.10  A  B  C
1     6  1 -1  1
2     7  1  1  1
3     8  0 -1  1
4     9 -1  1  1
5    10 -1  0 -1
Martin Wettstein
  • 2,771
  • 2
  • 9
  • 15
0

Here is an example to illustrate the problem, and offers one solution.

Say you have a column in your data frame that is a factor (item1):

df <- data.frame(
  item1 = c("extremely bad", "good", "bad", "very good", "bad", "very bad"),
  stringsAsFactors = TRUE
)

If you only use as.numeric on the column item1, you will have:

as.numeric(df$item1)
[1] 2 3 1 5 1 4

This corresponds to your seemingly random (but actually alphabetical order) factor levels:

levels(df$item1)
[1] "bad"           "extremely bad" "good"          "very bad"      "very good"

Instead, you should specify the order of your levels explicitly:

as.numeric(factor(df$item1, levels = c("extremely bad", 
                                       "very bad", 
                                       "bad", 
                                       "neutral", 
                                       "good", 
                                       "very good", 
                                       "extremely good")))
[1] 1 5 3 6 3 2

In this case, "extremely bad" is first in the order of levels, so is coded as 1.

Ben
  • 28,684
  • 5
  • 23
  • 45