91

I have a data.frame:

dat <- data.frame(fac1 = c(1, 2),
                  fac2 = c(4, 5),
                  fac3 = c(7, 8),
                  dbl1 = c('1', '2'),
                  dbl2 = c('4', '5'),
                  dbl3 = c('6', '7')
                  )

To change data types I can use something like

l1 <- c("fac1", "fac2", "fac3")
l2 <- c("dbl1", "dbl2", "dbl3")
dat[, l1] <- lapply(dat[, l1], factor)
dat[, l2] <- lapply(dat[, l2], as.numeric)

with dplyr

dat <- dat %>% mutate(
    fac1 = factor(fac1), fac2 = factor(fac2), fac3 = factor(fac3),
    dbl1 = as.numeric(dbl1), dbl2 = as.numeric(dbl2), dbl3 = as.numeric(dbl3)
)

is there a more elegant (shorter) way in dplyr?

thx Christof

Eric Krantz
  • 1,854
  • 15
  • 25
ckluss
  • 1,477
  • 4
  • 21
  • 33

9 Answers9

79

Edit (as of 2021-03)

As also pointed out in Eric's answer, mutate_[at|if|all] has been superseded by a combination of mutate() and across(). For reference, I will add the respective pendants to the examples in the original answer (see below):

# convert all factor to character
dat %>% mutate(across(where(is.factor), as.character))

# apply function (change encoding) to all character columns 
dat %>% mutate(across(where(is.character), 
               function(x){iconv(x, to = "ASCII//TRANSLIT")}))

# subsitute all NA in numeric columns
dat %>% mutate(across(where(is.numeric), function(x) tidyr::replace_na(x, 0)))

Original answer

Since Nick's answer is deprecated by now and Rafael's comment is really useful, I want to add this as an Answer. If you want to change all factor columns to character use mutate_if:

dat %>% mutate_if(is.factor, as.character)

Also other functions are allowed. I for instance used iconv to change the encoding of all character columns:

dat %>% mutate_if(is.character, function(x){iconv(x, to = "ASCII//TRANSLIT")})

or to substitute all NA by 0 in numeric columns:

dat %>% mutate_if(is.numeric, function(x){ifelse(is.na(x), 0, x)})
loki
  • 9,816
  • 7
  • 56
  • 82
60

You can use the standard evaluation version of mutate_each (which is mutate_each_) to change the column classes:

dat %>% mutate_each_(funs(factor), l1) %>% mutate_each_(funs(as.numeric), l2)
talat
  • 68,970
  • 21
  • 126
  • 157
  • 8
    In this case you could also use `starts_with()` – hadley Dec 28 '14 at 20:46
  • 3
    Thanks for your suggestion, @hadley. So for the first case that would be `dat %>% mutate_each(funs(factor), starts_with("fac"))` to convert all columns starting with the string "fac" to factor. – talat Dec 28 '14 at 20:55
  • @hadley Is it possible to make the same operation, but in a way that would transform all columns coming after the one the user chooses to get transformed? Not sure my question was clear. – iouraich Feb 14 '16 at 13:40
  • 15
    `mutate_each` is deprecated in latest version, use `mutate_at` instead... – Pablo Casas Sep 26 '17 at 20:04
51

EDIT - The syntax of this answer has been deprecated, loki's updated answer is more appropriate.

ORIGINAL-

From the bottom of the ?mutate_each (at least in dplyr 0.5) it looks like that function, as in @docendo discimus's answer, will be deprecated and replaced with more flexible alternatives mutate_if, mutate_all, and mutate_at. The one most similar to what @hadley mentions in his comment is probably using mutate_at. Note the order of the arguments is reversed, compared to mutate_each, and vars() uses select() like semantics, which I interpret to mean the ?select_helpers functions.

dat %>% mutate_at(vars(starts_with("fac")),funs(factor)) %>%   
  mutate_at(vars(starts_with("dbl")),funs(as.numeric))

But mutate_at can take column numbers instead of a vars() argument, and after reading through this page, and looking at the alternatives, I ended up using mutate_at but with grep to capture many different kinds of column names at once (unless you always have such obvious column names!)

dat %>% mutate_at(grep("^(fac|fctr|fckr)",colnames(.)),funs(factor)) %>%
  mutate_at(grep("^(dbl|num|qty)",colnames(.)),funs(as.numeric))

I was pretty excited about figuring out mutate_at + grep, because now one line can work on lots of columns.

EDIT - now I see matches() in among the select_helpers, which handles regex, so now I like this.

dat %>% mutate_at(vars(matches("fac|fctr|fckr")),funs(factor)) %>%
  mutate_at(vars(matches("dbl|num|qty")),funs(as.numeric))

Another generally-related comment - if you have all your date columns with matchable names, and consistent formats, this is powerful. In my case, this turns all my YYYYMMDD columns, which were read as numbers, into dates.

  mutate_at(vars(matches("_DT$")),funs(as.Date(as.character(.),format="%Y%m%d")))
Rafael Zayas
  • 2,061
  • 1
  • 18
  • 20
  • If you are changing from factor to number, keep in mind `as.numeric` on its own does not work. Factors are stored internally as integers with a table to give the factor level labels. Just using `as.numeric` will only give the internal integer codes. To change from factor to numeric the code should be slightly tweaked. `mutate_at(vars(matches("dbl|num|qty")),function(x) as.numeric(as.character(x)))` – camnesia Mar 12 '19 at 15:35
19

Dplyr across function has superseded _if, _at, and _all. See vignette("colwise").

dat %>% 
mutate(across(all_of(l1), as.factor),
       across(all_of(l2), as.numeric))
Eric Krantz
  • 1,854
  • 15
  • 25
  • 1
    similarly, using column indices: `dat <- dat %>% mutate(across(all_of(names(dat)[1:3]), as.factor), across(all_of(names(dat)[4:6]), as.numeric))` – Brian D Oct 28 '20 at 21:17
9

It's a one-liner with mutate_at:

dat %>% mutate_at("l1", factor) %>% mutate_at("l2", as.numeric)
Cettt
  • 11,460
  • 7
  • 35
  • 58
nexonvantec
  • 572
  • 1
  • 5
  • 18
4

A more general way of achieving column type transformation is as follows:

If you want to transform all your factor columns to character columns, e.g., this can be done using one pipe:

df %>%  mutate_each_( funs(as.character(.)), names( .[,sapply(., is.factor)] ))
Nick
  • 3,262
  • 30
  • 44
2

For future readers, if you are ok with dplyr guessing the column types, you can convert the col types of an entire df as if you were originally reading it in with readr and col_guess() with

library(tidyverse)
df %>% type_convert()
Leo
  • 187
  • 1
  • 9
1

Or mayby even more simple with convert from hablar:

library(hablar)

dat %>% 
  convert(fct(fac1, fac2, fac3),
          num(dbl1, dbl2, dbl3))

or combines with tidyselect:

dat %>% 
  convert(fct(contains("fac")),
          num(contains("dbl")))
davsjob
  • 1,882
  • 15
  • 10
0

Try this

df[,1:11] <- sapply(df[,1:11], as.character)
ah bon
  • 9,293
  • 12
  • 65
  • 148
Rupesh Kumar
  • 157
  • 3