9

I need some help tidying my data. I'm trying to convert some integers to factors (but not all integers to factors). I think I can do with selecting the variables in question but how do I add them back to the original data set? For example, keeping the values NOT selected from my raw_data_tbl and using the mutated types from the raw_data_tbl_int

enter image description here

enter image description here

    library(dplyr)

    raw_data_tbl %>% 
    select_if(is.numeric) %>% 
    select(-c(contains("units"), PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, 
           REAL_PRICE_HHU, REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit, STR_COST, DCC, 
           CREDIT_AMT)) %>% 
    mutate_if(is.numeric, as.factor)
massisenergy
  • 1,764
  • 3
  • 14
  • 25
willshelley
  • 101
  • 1
  • 1
  • 2
  • transformed_raw_data_tbl <- raw_data_tbl %>% mutate_at(vars(contains("units"), PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, REAL_PRICE_HHU, REBATE, RETURN_UNITS, Profit, STR_COST, DCC, CREDIT_AMT), funs(as.numeric)) %>% mutate_at(vars(-contains("units"), -PRO_ALLOW, -RTL_ACTUAL, -REAL_PRICE, -REAL_PRICE_HHU, -REBATE, -RETURN_UNITS, -Profit, -STR_COST, -DCC, -CREDIT_AMT), funs(as.factor)) – willshelley Feb 26 '19 at 14:25
  • This code got me where I wanted to be. Preserving the integer type of the variables that I wanted to keep as integers and changing the rest to factors. – willshelley Feb 26 '19 at 14:27

3 Answers3

19

As of dplyr 1.0.0 released on CRAN 2020-06-01, the scoped functions mutate_at(), mutate_if() and mutate_all() have been superseded thanks to the more generalizable across(). This means you can stay with just mutate(). The introductory blog post from April explains why it took so long to discover.

Toy example:

library(dplyr)

iris %>%
  mutate(across(c(Sepal.Width, 
                  Sepal.Length),
                factor))

In your case, you'd do this:

library(dplyr)

raw_data_tbl %>% 
  mutate(across(c(is.numeric,
                  -contains("units"),
                  -c(PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, REAL_PRICE_HHU,
                     REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit,
                     STR_COST, DCC, CREDIT_AMT)),
                factor))
meedstrom
  • 341
  • 2
  • 7
10

You can use mutate_at instead. Here's an example using the iris dataframe:

library(dplyr)

iris_factor <- iris %>%
  mutate_at(vars(Sepal.Width, 
                 Sepal.Length), 
            funs(factor))

Edit 08/2020

As of dplyr 0.8.0, funs() is deprecated. Use list() instead, as in

library(dplyr)

iris_factor <- iris %>%
  mutate_at(vars(Sepal.Width, 
                 Sepal.Length), 
            list(factor))

And the proof:

> str(iris_factor)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
 $ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Pablo Pretzel
  • 148
  • 13
Dave Gruenewald
  • 5,329
  • 1
  • 23
  • 35
  • 1
    if using tidyverse then this will work: `iris %>% mutate_at(c('Sepal.Width', 'Sepal.Length'), factor)` or `iris %>% mutate_at(vars(Sepal.Width, Sepal.Length), factor)` – keithpjolley Aug 30 '21 at 16:54
8

Honestly, I'd do it like this:

library(dplyr)

df = data.frame("LOC_ID" = c(1,2,3,4),
                "STRS" = c("a","b","c","d"),
                "UPC_CDE" = c(813,814,815,816))

df$LOC_ID = as.factor(df$LOC_ID)
df$UPC_CDE = as.factor(df$UPC_CDE)
LetEpsilonBeLessThanZero
  • 2,395
  • 2
  • 12
  • 22