I have a large dataset where 3 variables are encoded on a scale of 9 categories such as "extremely bad" to "extremely good". I need to make these into numbers appropriate for analysis. I have been advised to use as.numeric()
, however, this assigns numbers 1-9 randomly to the scale instead of reflecting the original scale order. For example, "fair" should be in the middle placed at number 5 but has been randomly assigned 2.
Asked
Active
Viewed 248 times
2
-
Presumably, your variables are stored as factors? In which case, have a look at [this](https://stackoverflow.com/a/3418192/1552004) answer. Also, a [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) makes it a lot easier to answer questions. – Dan Aug 20 '20 at 11:36
-
x <- data.frame(col=c("good", "not good")) ifelse(x == "good", 1, "NA") – Dr. Flow Aug 20 '20 at 11:43
2 Answers
0
You can use a recoding list that holds all the codes and their values. Then, you can apply the columns to this list to get the values:
recode_as = list("bad"=-1,
"neutral"=0,
"good"=1)
data = data.frame(6:10,
"A"=c("good","good","neutral","bad","bad"),
"B"=c("bad","good","bad","good","neutral"),
"C"=c("good","good","good","good","bad"))
data$A = unlist(recode_as[as.character(data$A)])
data$B = unlist(recode_as[as.character(data$B)])
data$C = unlist(recode_as[as.character(data$C)])
Data before transformation:
X6.10 A B C
1 6 good bad good
2 7 good good good
3 8 neutral bad good
4 9 bad good good
5 10 bad neutral bad
Data after transformation:
X6.10 A B C
1 6 1 -1 1
2 7 1 1 1
3 8 0 -1 1
4 9 -1 1 1
5 10 -1 0 -1

Martin Wettstein
- 2,771
- 2
- 9
- 15
0
Here is an example to illustrate the problem, and offers one solution.
Say you have a column in your data frame that is a factor (item1
):
df <- data.frame(
item1 = c("extremely bad", "good", "bad", "very good", "bad", "very bad"),
stringsAsFactors = TRUE
)
If you only use as.numeric
on the column item1
, you will have:
as.numeric(df$item1)
[1] 2 3 1 5 1 4
This corresponds to your seemingly random (but actually alphabetical order) factor levels:
levels(df$item1)
[1] "bad" "extremely bad" "good" "very bad" "very good"
Instead, you should specify the order of your levels explicitly:
as.numeric(factor(df$item1, levels = c("extremely bad",
"very bad",
"bad",
"neutral",
"good",
"very good",
"extremely good")))
[1] 1 5 3 6 3 2
In this case, "extremely bad" is first in the order of levels, so is coded as 1.

Ben
- 28,684
- 5
- 23
- 45