398

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?

GSee
  • 48,880
  • 13
  • 125
  • 145
Mike Dewar
  • 10,945
  • 14
  • 49
  • 65

18 Answers18

389

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

As @hadley points out, the following is more concise.

bob[] <- lapply(bob, as.character)

In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

Community
  • 1
  • 1
Shane
  • 98,550
  • 35
  • 224
  • 217
  • 32
    Shane, that'll also turn numerical columns into character. – Dirk Eddelbuettel May 17 '10 at 18:38
  • @Dirk: That's true, although it isn't clear whether that's a problem here. Clearly, creating things correctly up front is the best solution. I don't think that it's *easy* to automatically convert data types across a data frame. One option is to use the above but then use `type.convert` after casting everything to `character`, then recast `factors` back to `character` again. – Shane May 17 '10 at 18:56
  • This seems to discard row names. – piccolbo Jul 22 '13 at 17:04
  • To better understand `lapply()` and friends, you may want to see this [useful summary](http://stackoverflow.com/q/3505701/181638) – Assad Ebrahim Apr 28 '14 at 10:18
  • 2
    @piccolbo did you use `bob[] <- ` in the example or `bob <- `?; the first keeps the data.frame; the second changes the data.frame to a list, dropping the rownames. I will update the answer – David LeBauer Dec 11 '14 at 21:51
  • @david you are correct, not sure what evidence I had for my first comment – piccolbo Dec 12 '14 at 03:29
  • 7
    A variant that only converts factor columns to character using an anonymous function: `iris[] <- lapply(iris, function(x) if (is.factor(x)) as.character(x) else {x})` – Stefan F Jul 05 '17 at 18:09
  • iris[] <- sapply(iris, function(x) if (is.factor(x)) as.character(x) else x) Sapply also gives the same output on presence of [] – Therii May 27 '19 at 16:42
  • How did you figure out that ```[] = lapply``` returns a dataframe – Frank Jan 25 '21 at 15:30
345

To replace only factors:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

In package dplyr in version 0.5.0 new function mutate_if was introduced:

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

...and in version 1.0.0 was replaced by across:

library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob

Package purrr from RStudio gives another alternative:

library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
Marek
  • 49,472
  • 15
  • 99
  • 121
43

The global option

stringsAsFactors: The default setting for arguments of data.frame and read.table.

may be something you want to set to FALSE in your startup files (e.g. ~/.Rprofile). Please see help(options).

micstr
  • 5,080
  • 8
  • 48
  • 76
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
26

If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

The numbers returned in the last line correspond to the levels of the factor.

> levels(fact)
[1] "a" "b" "d"

Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

This also works for numeric values, provided you wrap your expression in as.numeric().

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
De Novo
  • 7,120
  • 1
  • 23
  • 39
Kikapp
  • 2,283
  • 1
  • 15
  • 7
  • 1
    This answer does not address the problem, which is how do I convert *all* of the factor columns in my data frame to character. `as.character(f)`, is better in both readability and efficiency to `levels(f)[as.numeric(f)]`. If you wanted to be clever, you could use `levels(f)[f]` instead. Note that when converting a factor with numeric values, you do get some benefit from `as.numeric(levels(f))[f]` over, e.g., `as.numeric(as.character(f))`, but this is because you only have to convert the levels to numeric and then subset. `as.character(f)` is just fine as it is. – De Novo Mar 19 '19 at 04:16
22

If you want a new data frame bobc where every factor vector in bobf is converted to a character vector, try this:

bobc <- rapply(bobf, as.character, classes="factor", how="replace")

If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor

f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)
scentoni
  • 729
  • 7
  • 5
  • 2
    +1 for doing only what was necessary (i.e. not converting the entire data.frame to character). This solution is robust to a data.frame that contains mixed types. – Joshua Ulrich Aug 01 '13 at 21:42
  • 3
    This example should be in the `Examples' section for rapply, like at: http://stat.ethz.ch/R-manual/R-devel/library/base/html/rapply.html . Anyone know how to request that that be so? – mpettis Aug 02 '13 at 03:13
  • If you want to end up with a data frame, simple wrap the rapply in a data.frame call (using the stringsAsFactors set to FALSE argument) – Taylored Web Sites Apr 04 '16 at 19:44
16

I typically make this function apart of all my projects. Quick and easy.

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}
Omar Wagih
  • 8,504
  • 7
  • 59
  • 75
11

Another way is to convert it using apply

bob2 <- apply(bob,2,as.character)

And a better one (the previous is of class 'matrix')

bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
gd047
  • 29,749
  • 18
  • 107
  • 146
9

Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

Try this:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).

Matt Parker
  • 26,709
  • 7
  • 54
  • 72
  • 1
    This does work, if he sets it when creating `bob` to begin with (but not after the fact). – Shane May 17 '10 at 17:18
  • Right. Just wanted to be clear that this doesn't solve the problem, per se - but thanks for noting that it does prevent it. – Matt Parker May 17 '10 at 17:34
8

Or you can try transform:

newbob <- transform(bob, phenotype = as.character(phenotype))

Just be sure to put every factor you'd like to convert to character.

Or you can do something like this and kill all the pests with one blow:

newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)

It's not good idea to shove the data in code like this, I could do the sapply part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)

This approach, however, has a downside... you must reorganize columns afterwards, while with transform you can do whatever you like, but at cost of "pedestrian-style-code-writting"...

So there... =)

aL3xa
  • 35,415
  • 18
  • 79
  • 112
7

At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

6

If you would use data.table package for the operations on data.frame then the problem is not present.

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 
jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • DT circumvents the sapply fix proposed by Marek: `In [<-.data.table(*tmp*, sapply(bob, is.factor), : Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 1234 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.` It's easier to fix the DF and recreate the DT. – Matt Chambers Aug 03 '16 at 17:49
3

This works for me - I finally figured a one liner

df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
user1617979
  • 2,370
  • 3
  • 25
  • 30
3

New function "across" was introduced in dplyr version 1.0.0. The new function will supersede scoped variables (_if, _at, _all). Here's the official documentation

library(dplyr)
bob <- bob %>% 
       mutate(across(where(is.factor), as.character))
radhikesh93
  • 870
  • 9
  • 25
2

You should use convert in hablar which gives readable syntax compatible with tidyverse pipes:

library(dplyr)
library(hablar)

df <- tibble(a = factor(c(1, 2, 3, 4)),
             b = factor(c(5, 6, 7, 8)))

df %>% convert(chr(a:b))

which gives you:

  a     b    
  <chr> <chr>
1 1     5    
2 2     6    
3 3     7    
4 4     8   
davsjob
  • 1,882
  • 15
  • 10
2

With the dplyr-package loaded use

bob=bob%>%mutate_at("phenotype", as.character)

if you only want to change the phenotype-column specifically.

nexonvantec
  • 572
  • 1
  • 5
  • 18
1

This function does the trick

df <- stacomirtools::killfactor(df)
Cedric
  • 2,412
  • 17
  • 31
0

Maybe a newer option?

library("tidyverse")

bob <- bob %>% group_by_if(is.factor, as.character)
rachelette
  • 47
  • 5
0

This works transforming all to character and then the numeric to numeric:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}

Adapted from: Get column types of excel sheet automatically

Ferroao
  • 3,042
  • 28
  • 53