change all strings in dataframe to one of several different values r (NOT a duplicate)

Question

(see "IMPORTANT" note below - question is not solved by the answers to a similar post; it's not a duplicate)

I have a bunch of dataframes with a ton of vectors containing the same two likert scales that I need to recode from strings (currently as factors) to numeric values (1 to 5). Here are the scales:

Likert scale A:
---------------
Terrible      = 1
Below Average = 2
Average       = 3
Above Average = 4
Excellent     = 5


Likert Scale B:
---------------
Strongly disagree = 1
Somewhat disagree = 2
Meh               = 3
Somewhat agree    = 4
Strongly agree    = 5

As an example, here is HAVE1, one of the many datasets (each of which have a different number of vectors and different vector names) with these likert patterns:

Apples        Oranges       Bananas       ServiceGood       ShortTime
Excellent     Terrible      Average       Somewhat agree    Somewhat agree
Excellent                   Above Average Strongly agree    Somewhat agree
Above Average Terrible      Below Average                   Somewhat disagree
Excellent     Average       Below Average Meh               Strongly disagree
Below Average Terrible      Above Average Somewhat agree    Meh
…             …             …             …                 …

I need to replace the character strings with a numeric equivalent from the likert scale codes listed above. For example, the first 5 observations of WANT1 should look like this:

Apples  Oranges Bananas ServiceGood ShortTime
5       1       3       4           4
5       NA      4       5           4
4       1       2       NA          2
5       3       2       3           5
2       1       4       4           3
…       …       …       …           …

I'm looking for efficient ways to modify these values that don't require any invocation of vector names due to the differences in names and positions across all my HAVE dataframes. Any thoughts?

IMPORTANT: this problem is NOT solved by any of the examples provided in the question "Dictionary style replace multiple items"; every answer throws an error of some sort or fills the values of most variables with NA/replaces the wrong values.

For example, the following code...

map = setNames(c("Excellent", "Above Average", "Below Average", "Average", "Terrible"), 
               c("5", "4", "2", "3", "1"))
HAVE1[] <- map[unlist(HAVE1)]

Sets every value in my file to NA except for the empty cells, which it codes to "Terrible". The only solution from the other question that works in the "dictionaries" answer is the solution proposed in the question itself by the asker - it is this very approach that the asker hoped to avoid.

Note - the likert variables read in as factors from a csv file, which could contribute to this issue. — J.Q, May 29 '18 at 18:01

score 2 · Accepted Answer · answered May 29 '18 at 21:36

Here's a quick way using dplyr::mutate_all. str_replace allows us to insert missing values where there are empty strings, and then recode lets us quickly swap out the text levels with the desired numeric ones. This doesn't require you to know the names of the columns in each data frame at all. Note that this will deal with unrecognised options by setting them to NA, which should also alert you that there are misspellings in your data. You might also look into mutate_at if there are columns in each dataframe that you don't want to recode in this way.

To deal with the fact that you need to apply to multiple dataframes, if there are too many to manually apply this function to I would store them as elements of a list and then map across the list.

library(tidyverse)
df <- read_table(
"Apples        Oranges       Bananas       ServiceGood       ShortTime
Excellent     Terrible      Average       Somewhat agree    Somewhat agree
Excellent                   Above Average Strongly agree    Somewhat agree
Above Average Terrible      Below Average                   Somewhat disagree
Excellent     Average       Below Average Meh               Strongly disagree
Below Average Terrible      Above Average Somewhat agree    Meh"
)

df %>%
  mutate_all(~ str_replace(., "^$", NA_character_)) %>%
  mutate_all(
    .funs = ~ as.integer(recode(
      .x = .,
      "Terrible"      = 1,
      "Below Average" = 2,
      "Average"       = 3,
      "Above Average" = 4,
      "Excellent"     = 5,
      "Strongly disagree" = 1,
      "Somewhat disagree" = 2,
      "Meh"               = 3,
      "Somewhat agree"    = 4,
      "Strongly agree"    = 5
    ))
  )
#> # A tibble: 5 x 5
#>   Apples Oranges Bananas ServiceGood ShortTime
#>    <int>   <int>   <int>       <int>     <int>
#> 1      5       1       3           4         4
#> 2      5      NA       4           5         4
#> 3      4       1       2          NA         2
#> 4      5       3       2           3         1
#> 5      2       1       4           4         3

Created on 2018-05-29 by the reprex package (v0.2.0).

score 0 · Answer 2 · answered May 29 '18 at 17:53

I currently use a set of lapply() calls that use gsub() to look for a match on each string, then replace that match with the numeric value of the likert scale. For example:

WANT1 <- data.frame(lapply(HAVE1, function(a){gsub("Excellent",     5, a)}))
WANT1 <- data.frame(lapply(WANT1, function(b){gsub("Above Average", 4, b)}))
WANT1 <- data.frame(lapply(WANT1, function(d){gsub("Below Average", 2, d)}))
WANT1 <- data.frame(lapply(WANT1, function(c){gsub("Average",       3, c)}))
WANT1 <- data.frame(lapply(WANT1, function(e){gsub("Terrible",      1, e)}))

WANT1 <- data.frame(lapply(WANT1, function(a){gsub("Strongly agree",    5, a)}))
WANT1 <- data.frame(lapply(WANT1, function(b){gsub("Somewhat agree",    4, b)}))
WANT1 <- data.frame(lapply(WANT1, function(d){gsub("Somewhat disagree", 2, d)}))
WANT1 <- data.frame(lapply(WANT1, function(c){gsub("Meh",               3, c)}))
WANT1 <- data.frame(lapply(WANT1, function(e){gsub("Strongly Disagree", 1, e)}))

I nest this in a loop that iterates through all the HAVE dataframes and produces all the WANT dataframes, but I am looking for alternative approaches to accomplish the same goal. Any thoughts?

change all strings in dataframe to one of several different values r (NOT a duplicate)

2 Answers2