86

What is the quickest/best way to change a large number of columns to numeric from factor?

I used the following code but it appears to have re-ordered my data.

> head(stats[,1:2])
  rk                 team
1  1 Washington Capitals*
2  2     San Jose Sharks*
3  3  Chicago Blackhawks*
4  4     Phoenix Coyotes*
5  5   New Jersey Devils*
6  6   Vancouver Canucks*

for(i in c(1,3:ncol(stats))) {
    stats[,i] <- as.numeric(stats[,i])
}

> head(stats[,1:2])
  rk                 team
1  2 Washington Capitals*
2 13     San Jose Sharks*
3 24  Chicago Blackhawks*
4 26     Phoenix Coyotes*
5 27   New Jersey Devils*
6 28   Vancouver Canucks*

What is the best way, short of naming every column as in:

df$colname <- as.numeric(ds$colname)
audeoudh
  • 1,279
  • 1
  • 8
  • 23
Btibert3
  • 38,798
  • 44
  • 129
  • 168
  • 4
    Isn't there any generic solution?. Some of the solutions proposed here only work with factors, other work always except with factors, and so on... – skan May 06 '16 at 10:32

16 Answers16

76

You have to be careful while changing factors to numeric. Here is a line of code that would change a set of columns from factor to numeric. I am assuming here that the columns to be changed to numeric are 1, 3, 4 and 5 respectively. You could change it accordingly

cols = c(1, 3, 4, 5);    
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x)));
Andreas Dibiasi
  • 261
  • 3
  • 10
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • 3
    This won't work correctly. Example: `x<-as.factor(1:3); df<-data.frame(a=x,y=runif(3),b=x,c=x,d=x)`. I don't think that `apply` is appropriate to this kind of problems. – Marek Sep 26 '10 at 15:07
  • 1
    apply works perfectly in these situations. the error in my code was using margin = 1, instead of 2 as the function needs to be applied column wise. i have edited my answer accordingly. – Ramnath Sep 26 '10 at 19:46
  • Now it works. But I think it could be done without `apply`. Check my edit. – Marek Sep 27 '10 at 10:00
  • 2
    ... or Joris answer with `unlist`. And `as.character` conversion in your solution is not needed cause `apply` converts `df[,cols]` to `character` so `apply(df[,cols], 2, function(x) as.numeric(x))` will work too. – Marek Sep 27 '10 at 10:12
  • @Ramnath,why do you use `=`?why not `<-`? – kittygirl Apr 16 '19 at 13:00
59

Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x) returning the internal, numeric representation of the factor x at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character() first as per Ramnath's example.

Your for loop is just as reasonable as an apply call and might be slightly more readable as to what the intention of the code is. Just change this line:

stats[,i] <- as.numeric(stats[,i])

to read

stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the R FAQ.

HTH

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 2
    No need for any kind of loop. Just use the indices and unlist(). Edit : I added an answer illustrating this. – Joris Meys Sep 26 '10 at 22:19
  • This approach only works in this specific case. I tried to use it to convert columns to `factor` and it didnt work. `sapply` or `mutate_if` seem to be more generally applicable solutions. – Leo Aug 16 '17 at 13:04
  • @Leo Care to expand, cos I know for a fact this works. It's *exactly* the same solution as Ramnath's below except he uses `apply` to run the loop and the OP was using a `for` loop explicitly. In fact, all the highly up-voted answers use the `as.numeric(as.character())` idiom. – Gavin Simpson Aug 17 '17 at 03:05
  • Yes it works to change the class of multiple columns to `numeric`, but it does not work in reverse (to change the class of multiple columns to `factor`). If you use indices you need `unlist()` and when applied to columns with characters it unlists every single character, which makes it not work any more when putting the output back into `stats[,i]`. Check the answer here: https://stackoverflow.com/questions/45713473/convert-data-frame-columns-to-factor-with-indexing – Leo Aug 17 '17 at 15:46
  • @Leo *of course* it doesn't work in reverse! What on earth gave you the impression that it would? It was never designed and the OP never asked for that. Hard to answer questions that aren't asked. If you want to convert *to* a factor use `as.factor()` in place of `as.numeric(as.character())` here and it'll work just fine. Of course, if you have a mix of columns you'll need to choose `i` selectively, but that i also trivial. – Gavin Simpson Aug 18 '17 at 15:56
  • What I meant was `df[, indexlist] = as.factor(df[, indexlist])` does not work. The reason I expected it to work, is that the title of the question was _change the class of many columns in a data frame_, but I see that you edited it now. Because of the former title, many other questions about converting multiple data frame columns to `factor` were marked as a duplicate of this question. Which is only problematic for your answer. The answers that use `apply` or `mutate_if` do answer the questions that point here as a duplicate. That is what I meant by them being more generally applicable. – Leo Aug 19 '17 at 15:02
  • @Leo If you look carefully at my answer, you will see that I don't suggest `df[, indexlist]` at all. I simply suggest replacing one line *in the OP's **`for()`** loop* with the second line of code in my answer. Therefore your criticism of my answer is based on a misunderstanding. The OPs `for()` loop takes the place of the `apply` or `mutate_if` functions you suggest. To be clear `i` is a length 1 vector indexing the current column to convert, *not* a vector of length > 1 indexing many columns; other answers here suggest a loop-less answer, but that's not what I suggested. – Gavin Simpson Aug 21 '17 at 18:28
40

This can be done in one line, there's no need for a loop, be it a for-loop or an apply. Use unlist() instead :

# testdata
Df <- data.frame(
  x = as.factor(sample(1:5,30,r=TRUE)),
  y = as.factor(sample(1:5,30,r=TRUE)),
  z = as.factor(sample(1:5,30,r=TRUE)),
  w = as.factor(sample(1:5,30,r=TRUE))
)
##

Df[,c("y","w")] <- as.numeric(as.character(unlist(Df[,c("y","w")])))

str(Df)

Edit : for your code, this becomes :

id <- c(1,3:ncol(stats))) 
stats[,id] <- as.numeric(as.character(unlist(stats[,id])))

Obviously, if you have a one-column data frame and you don't want the automatic dimension reduction of R to convert it to a vector, you'll have to add the drop=FALSE argument.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • 1
    Small improvement could be setting `recursive` and `use.names` parameters of `unlist` both to `FALSE`. – Marek Sep 27 '10 at 10:10
  • @Marek : true. I love this game :-) – Joris Meys Sep 27 '10 at 11:49
  • I am just going to add for those looking for answers in the future, this is not equivalent to op + gavin's method if the dataframe is of only one column. It will convert to a vector in that case, whereas op's will still be a dataframe. – themartinmcfly Feb 27 '13 at 03:15
  • @themartinmcfly the good ol' `drop=FALSE`... But thx for pointing that out, I added it to the answer. – Joris Meys Feb 27 '13 at 13:37
  • 1
    for those working with tidyverse: interestingly, this does not seem to work when the object is also a tibble: The code fails after `Df <- tibble::as_tibble(Df)` – tjebo Jun 03 '20 at 15:21
  • 1
    @Tjebo with the updates of tibble and the diversion between tibbles and data frames, this old approach isn't the best option in tidyverse indeed. You better make use of the tidyselect functions in combination with `mutate_if`. Or whatever new approach is made available in the next iteration of `dplyr`... – Joris Meys Jun 06 '20 at 12:38
32

I know this question is long resolved, but I recently had a similar issue and think I've found a little more elegant and functional solution, although it requires the magrittr package.

library(magrittr)
cols = c(1, 3, 4, 5)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

The %<>% operator pipes and reassigns, which is very useful for keeping data cleaning and transformation simple. Now the list apply function is much easier to read, by only specifying the function you wish to apply.

Joe
  • 8,073
  • 1
  • 52
  • 58
Dan
  • 493
  • 4
  • 7
  • 2
    neat solution. you forgot one bracket but I can't make this edit because it's too short: `df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))` – epo3 Sep 15 '16 at 09:51
  • 1
    I don't think you even need to wrap that in lappy `df[,cols] %<>% as.numeric(as.character(.))` works the same – Nate Oct 04 '16 at 20:07
  • when I try this command I get the following error `Error in [.data.table(Results, , cols) : j (the 2nd argument inside [...]) is a single symbol but column name 'cols' is not found. Perhaps you intended DT[,..cols] or DT[,cols,with=FALSE]. This difference to data.frame is deliberate and explained in FAQ 1.1.` – Urvah Shabbir Oct 11 '17 at 17:15
  • Code is like: `cols <- c("a","b"); df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))` – Urvah Shabbir Oct 11 '17 at 17:17
  • Bracket now added. – Joe Feb 22 '18 at 16:14
19

Here are some dplyr options:

# by column type:
df %>% 
  mutate_if(is.factor, ~as.numeric(as.character(.)))

# by specific columns:
df %>% 
  mutate_at(vars(x, y, z), ~as.numeric(as.character(.))) 

# all columns:
df %>% 
  mutate_all(~as.numeric(as.character(.))) 
sbha
  • 9,802
  • 2
  • 74
  • 62
6

I think that ucfagls found why your loop is not working.

In case you still don't want use a loop here is solution with lapply:

factorToNumeric <- function(f) as.numeric(levels(f))[as.integer(f)] 
cols <- c(1, 3:ncol(stats))
stats[cols] <- lapply(stats[cols], factorToNumeric)

Edit. I found simpler solution. It seems that as.matrix convert to character. So

stats[cols] <- as.numeric(as.matrix(stats[cols]))

should do what you want.

Community
  • 1
  • 1
Marek
  • 49,472
  • 15
  • 99
  • 121
5

lapply is pretty much designed for this

unfactorize<-c("colA","colB")
df[,unfactorize]<-lapply(unfactorize, function(x) as.numeric(as.character(df[,x])))
transcom
  • 86
  • 1
  • 4
  • Hi @transcom, and welcome to stackoverflow. Note that this question is about converting to numeric representation from a factor, not the other way around. See Marek's solution. – Aaron left Stack Overflow Feb 10 '14 at 16:54
  • @Aaron, understood. I posted this answer due to the ambiguity of the OP's title, operating under the assumption that others may land here looking for a way to convert multiple columns easily, regardless of class. Anyway, I've edited my answer to more appropriately address the question :) – transcom Feb 18 '14 at 16:56
2

I found this function on a couple other duplicate threads and have found it an elegant and general way to solve this problem. This thread shows up first on most searches on this topic, so I am sharing it here to save folks some time. I take no credit for this just so see the original posts here and here for details.

df <- data.frame(x = 1:10,
                 y = rep(1:2, 5),
                 k = rnorm(10, 5,2),
                 z = rep(c(2010, 2012, 2011, 2010, 1999), 2),
                 j = c(rep(c("a", "b", "c"), 3), "d"))

convert.magic <- function(obj, type){
  FUN1 <- switch(type,
                 character = as.character,
                 numeric = as.numeric,
                 factor = as.factor)
  out <- lapply(obj, FUN1)
  as.data.frame(out)
}

str(df)
str(convert.magic(df, "character"))
str(convert.magic(df, "factor"))
df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor")
Community
  • 1
  • 1
Electioneer
  • 447
  • 3
  • 11
1

I would like to point out that if you have NAs in any column, simply using subscripts will not work. If there are NAs in the factor, you must use the apply script provided by Ramnath.

E.g.

Df <- data.frame(
  x = c(NA,as.factor(sample(1:5,30,r=T))),
  y = c(NA,as.factor(sample(1:5,30,r=T))),
  z = c(NA,as.factor(sample(1:5,30,r=T))),
  w = c(NA,as.factor(sample(1:5,30,r=T)))
)

Df[,c(1:4)] <- as.numeric(as.character(Df[,c(1:4)]))

Returns the following:

Warning message:
NAs introduced by coercion 

    > head(Df)
       x  y  z  w
    1 NA NA NA NA
    2 NA NA NA NA
    3 NA NA NA NA
    4 NA NA NA NA
    5 NA NA NA NA
    6 NA NA NA NA

But:

Df[,c(1:4)]= apply(Df[,c(1:4)], 2, function(x) as.numeric(as.character(x)))

Returns:

> head(Df)
   x  y  z  w
1 NA NA NA NA
2  2  3  4  1
3  1  5  3  4
4  2  3  4  1
5  5  3  5  5
6  4  2  4  4
Elizabeth
  • 23
  • 5
1

you can use unfactor() function from "varhandle" package form CRAN:

library("varhandle")

my_iris <- data.frame(Sepal.Length = factor(iris$Sepal.Length),
                      sample_id = factor(1:nrow(iris)))

my_iris <- unfactor(my_iris)
Mehrad Mahmoudian
  • 3,466
  • 32
  • 36
1

I like this code because it's pretty handy:

  data[] <- lapply(data, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type

It is not exactly what was asked for (convert to numeric), but in many cases even more appropriate.

SDahm
  • 474
  • 2
  • 9
  • 21
1

Based on @SDahm's answer, this was an "optimal" solution for my tibble:

data %<>% lapply(type.convert) %>% as.data.table()

This requires dplyr and magrittr.

James Hirschorn
  • 7,032
  • 5
  • 45
  • 53
1

I tried a bunch of these on a similar problem and kept getting NAs. Base R has some really irritating coercion behaviors, which are generally fixed in Tidyverse packages. I used to avoid them because I didn't want to create dependencies, but they make life so much easier that now I don't even bother trying to figure out the Base R solution most of the time.

Here's the Tidyverse solution, which is extremely simple and elegant:

library(purrr)

mydf <- data.frame(
  x1 = factor(c(3, 5, 4, 2, 1)),
  x2 = factor(c("A", "C", "B", "D", "E")),
  x3 = c(10, 8, 6, 4, 2))

map_df(mydf, as.numeric)
Aaron Cooley
  • 438
  • 3
  • 8
  • Most of the answers (at least all the top answers) make sure to do the `as.numeric(as.character())` conversion to avoid the [all-too-common](https://stackoverflow.com/q/3418128/903061) conversion of integer levels instead of values to numeric. I'd happily upvote this answer if you show that option. – Gregor Thomas Feb 04 '19 at 16:19
1

df$colname <- as.numeric(df$colname)

I tried this way for changing one column type and I think it is better than many other versions, if you are not going to change all column types

df$colname <- as.character(df$colname)

for the vice versa.

0

I had problems converting all columns to numeric with an apply() call:

apply(data, 2, as.numeric)

The problem turns out to be because some of the strings had a comma in them -- e.g. "1,024.63" instead of "1024.63" -- and R does not like this way of formatting numbers. So I removed them and then ran as.numeric():

data = as.data.frame(apply(data, 2, function(x) {
  y = str_replace_all(x, ",", "") #remove commas
  return(as.numeric(y)) #then convert
}))

Note that this requires the stringr package to be loaded.

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
0

That's what's worked for me. The apply() function tries to coerce df to matrix and it returns NA's.

numeric.df <- as.data.frame(sapply(df, 2, as.numeric))