Conditionally mutate columns based on column class

Question

My question is based on a previous topic posted here: Mutating multiple columns in a data frame

Suppose I have a tibble as follows:

id   char_var_1   char_var_2   num_var_1   num_var_2  ... x_var_n
1       ...           ...         ...         ...           ...
2       ...           ...         ...         ...           ...
3       ...           ...         ...         ...           ...

where id is the key and char_var_x is a character variable and num_var_x is a numerical variable. I have 346 columns in total and I want to write a function that scales all the numerical variables except the id column. I'm looking for an elegant way to mutate these columns using pipes and dplyr functions.

Obviously the following works for all numeric variables:

pre_process_data <- function(dt)
{
  # scale numeric variables
  dt %>% mutate_if(is.numeric, scale)
}

But I'm looking for a way to exclude id column from scaling and retain the original values and at the same time scale all other numerical variables. Is there an elegant way to do this?

`dt %>% mutate(id = as.character(id)) %>% mutate_if(is.numeric, scale) %>% mutate(id = as.numeric(id))` ? — Jake Kaupp, Jan 23 '18 at 17:48

zx8754 · Accepted Answer · 2018-01-24T10:27:06.647

6

Try below, answer is similar to select_if post:

library(dplyr)

# Using @Psidom's example data: https://stackoverflow.com/a/48408027

df %>%
  mutate_if(function(col) is.numeric(col) &
              !all(col == .$id), scale)
#   id a  b
# 1  1 a -1
# 2  2 b  0
# 3  3 c  1

edited Jan 24 '18 at 10:27

answered Jan 23 '18 at 17:59

zx8754

52,746
12
114
209

1

I think this doesn't work in a situation where there is another column for which a number of rows share the same value with the `id` column. `all(col != .$id)` will evaluate to `FALSE` in this case. I think it should be: `df %>% mutate_if(function(col) is.numeric(col) & !all(col == .$id), scale)` – tadzi Jan 24 '18 at 09:16

score 2 · Answer 2 · answered Jan 23 '18 at 17:57

Not a canonical way to do this, but with a little bit hack, you can do this with mutate_at by making the integer indices of columns to mutate using which with manually constructed column selecting conditions:

df = data.frame(id = 1:3, a = letters[1:3], b = 2:4)
df %>% 
    mutate_at(vars(which(sapply(., is.numeric) & names(.) != 'id')), scale)

#  id a  b
#1  1 a -1
#2  2 b  0
#3  3 c  1

score 1 · Answer 3 · answered Jan 24 '18 at 11:02

1

How about the "make the column your interested a character, then change it back approach?"

  dt %>%
    mutate(id = as.character(id)) %>% 
    mutate_if(is.numeric, scale) %>% 
    mutate(id = as.numeric(id))

answered Jan 24 '18 at 11:02

Jake Kaupp

7,892
2
26
36

score 1 · Answer 4 · answered Feb 03 '21 at 09:09

1

you can use dplyr's across

df %>% mutate(across(c(where(is.numeric),-id),scale))

answered Feb 03 '21 at 09:09

Adam Lee Perelman

820
8
9

Conditionally mutate columns based on column class

4 Answers4

Linked