4

I want to recode the following values < 4 = -1, 4 = 0, > 4 = 1 for the following variables defined by core.vars in the dataset, and still keep the rest of the variables in the data frame.

temp.df <- as.tibble (mtcars)
other.vars <- c('hp', 'drat', 'wt')
core.vars <- c('mpg', 'cyl', 'disp')
temp.df <- rownames_to_column (temp.df, var ="cars_id")
temp.df <- temp.df %>% mutate_if (is.integer, as.numeric)

I have tried a number of ways to implement this. Using case_when, mutate, recode but with no luck. recode requires a vector and so my thought was to create a vector using case_when or mutate for each variable of interest and then recoding the values. But they have failed.

temp.df <- temp.df %>% 
           mutate_at(.vars %in% (core.vars)), '< 4' = "-1", '4' = "0", '> 4' = "1")

Error: unexpected ',' in "temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)),"

temp.df <- temp.df %>% 
           mutate_at(vars(one_of(core.vars)), '< 4' = "-1", '4' = "0", '> 4' = "1")

Error in inherits(x, "fun_list") : argument ".funs" is missing, with no default

 temp.df <- temp.df %>% 
            mutate (temp.df, case_when (vars(one_of(core.vars)), recode ('< 4' = "-1", '4' = "0", '> 4' = "1")))

Error in mutate_impl(.data, dots) : Column temp.df is of unsupported class data.frame

 temp.df <- temp.df %>% 
            case_when (vars(one_of(core.vars)), recode ('< 4' = "-1", '4' = "0", '> 4' = "1"))

Error in recode.character(< 4 = "-1", 4 = "0", > 4 = "1") : argument ".x" is missing, with no default

temp.df <- temp.df %>% rowwise() %>% mutate_at(vars (core.vars),
                                            funs (case_when (
                                                recode(., '< 4' = -1, '0' = 0, '>4' = 1)
                                            ))) %>%
 ungroup()`

Error in mutate_impl(.data, dots) : Evaluation error: Case 1 (recode(mpg,< 4= -1,0= 0,>4= 1)) must be a two-sided formula, not a double. In addition: Warning message: In recode.numeric(mpg, < 4 = -1, 0 = 0, >4 = 1) : NAs introduced by coercion

Previous questions on the forum include how to do this for individual variables, however as mentioned I have 100 variables and 300 samples so inputting them individually line by line is not an option.

Ideally, it would be nice to not create a separate data frame and then do join, or to create multiple separate variables as mutate would do.

I am sure there is a a for loop and/or ifelse method for this, but was trying to use tidyverse to achieve the goals. Any suggestions would be helpful.

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
KP1
  • 129
  • 2
  • 8

1 Answers1

4
temp.df %>%
  mutate_at(vars(one_of(core.vars)), 
            function(x) case_when(
              x < 4 ~ -1,
              x == 4 ~ 0,
              x > 4 ~ 1
            ))

Output

# A tibble: 32 x 12
   cars_id             mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Mazda RX4             1     1     1   110  3.9   2.62  16.5     0     1     4     4
 2 Mazda RX4 Wag         1     1     1   110  3.9   2.88  17.0     0     1     4     4
 3 Datsun 710            1     0     1    93  3.85  2.32  18.6     1     1     4     1
 4 Hornet 4 Drive        1     1     1   110  3.08  3.22  19.4     1     0     3     1
 5 Hornet Sportabout     1     1     1   175  3.15  3.44  17.0     0     0     3     2
 6 Valiant               1     1     1   105  2.76  3.46  20.2     1     0     3     1
 7 Duster 360            1     1     1   245  3.21  3.57  15.8     0     0     3     4
 8 Merc 240D             1     0     1    62  3.69  3.19  20       1     0     4     2
 9 Merc 230              1     0     1    95  3.92  3.15  22.9     1     0     4     2
10 Merc 280              1     1     1   123  3.92  3.44  18.3     1     0     4     4
Jack Brookes
  • 3,720
  • 2
  • 11
  • 22
  • Amazing dude. Thanks. Exactly what what I was looking for. – KP1 Jul 31 '18 at 19:01
  • Any suggestions on how to go about adding a function such as this one to that dataset: The error for numeric data is for cars_id column but I would like to keep it there. `dichotomize.dataset <- function(x) { return( as.numeric( x > median(x, na.rm = TRUE) ) ); }` `temp1.df <- temp.df %>% mutate_at(vars(one_of(other.vars)), dichotomize.dataset())` Error in median.default(x, na.rm = TRUE) : need numeric data In addition: Warning message: Error in median.default(x, na.rm = TRUE) : need numeric data – KP1 Jul 31 '18 at 19:17
  • Try without the () after the function name in mutate, you don't want to execute the function, you are just telling it the function you want to execute on each of your columns – Jack Brookes Jul 31 '18 at 19:34
  • Great tip. Appreciate your help. – KP1 Jul 31 '18 at 19:40