How can I completely replace categorical values with self-assigned numerical values in a whole dataframe? r

Question

I have a data frame with the only the categorical values "Agree", "Disagree" and "Not Certain". I just want to replace "Agree" with the numerical value 2, "Disagree" with 1 and "Not certain" with 0.5, so that I can add them and get a score.

I found that mapvalues only applies to factors and vectors and I don't know how to use as.numeric so that I can specify which values should be assigned to the categorical variables. Additionally, I cannot actually replace the values in the dataframe, it just creates a new value named like the data frame with the three numbers in it.

Can you add a snipped of your data and the code you have tried at the end of your question to [make a great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Use `dput(head(df))` where `df` is the name of your dataframe. — markus, Jul 28 '18 at 11:11
Related: [Replace values with numbers](https://stackoverflow.com/questions/51485802/replace-values-with-numbers) — markus, Jul 28 '18 at 11:16

score 0 · Answer 1 · answered Jul 28 '18 at 11:25

As you don't provide sample data, let's generate a vector with 10 random elements "Agree", "Disagree", "Not Certain"

set.seed(2017)
ss <- sample(c("Agree", "Disagree", "Not Certain"), 10, replace = T)

We specify numeric values for every string, and use match to map string entries to values

val <- c("Agree" = 2, "Disagree" = 1, "Not Certain" = 0.5)
val[match(ss, names(val))]
#Not Certain    Disagree    Disagree       Agree Not Certain Not Certain
#        0.5         1.0         1.0         2.0         0.5         0.5
#      Agree    Disagree    Disagree       Agree
#        2.0         1.0         1.0         2.0

To sum we can do

sum(val[match(ss, names(val))])
#[1] 11.5

score 0 · Accepted Answer · answered Jul 28 '18 at 11:54

One can use dplyr::case_when in above case since, OP wants to replace string values in all columns of a data.frame.

library(dplyr)
df %>% mutate_all(funs(case_when(
  . == "Agree" ~ 2,
  . == "Disagree" ~ 1,
  . == "Not Certain"  ~ 0.5
)))

#    FirstCol SecondCol ThirdCol
# 1       2.0       2.0      0.5
# 2       1.0       2.0      2.0
# 3       1.0       0.5      1.0
# 4       0.5       1.0      2.0
# 5       2.0       0.5      2.0
# 6       0.5       1.0      1.0
# 7       0.5       0.5      2.0
# 8       1.0       0.5      1.0
# 9       1.0       1.0      0.5
# 10      2.0       0.5      1.0

Data: Sample data

choices <- c("Agree", "Disagree", "Not Certain")
set.seed(1)

df <- data.frame(FirstCol = sample(choices, 10, replace = TRUE ),
                 SecondCol = sample(choices, 10, replace = TRUE ),
                 ThirdCol = sample(choices, 10, replace = TRUE ),
                 stringsAsFactors = FALSE)

 df
#       FirstCol   SecondCol    ThirdCol
# 1        Agree       Agree Not Certain
# 2     Disagree       Agree       Agree
# 3     Disagree Not Certain    Disagree
# 4  Not Certain    Disagree       Agree
# 5        Agree Not Certain       Agree
# 6  Not Certain    Disagree    Disagree
# 7  Not Certain Not Certain       Agree
# 8     Disagree Not Certain    Disagree
# 9     Disagree    Disagree Not Certain
# 10       Agree Not Certain    Disagree

Thank you! The solution of MKR worked, but I noticed that I need to exclude the first two lines from that transformation df[2:14] %>% mutate_all(funs(case_when( . == "Agree" ~ 2, . == "Disagree" ~ 1, . == "Uncertain" ~ 0.5 ))) gives me just NA's in the first two rows... — Jana, Jul 28 '18 at 12:13
@Jana Yes you can do that. If this solution worked for you then you can accept the answer by clicking on `tick` symbol in left of answer box. — MKR, Jul 28 '18 at 12:19

How can I completely replace categorical values with self-assigned numerical values in a whole dataframe? r

2 Answers2