18

I ran into an unexpected problem when trying to convert multiple columns of a data table into factor columns. I've reproduced it as follows:

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
tst[,as.factor(a)]  #Returns expected result
tst[,as.factor('a'),with=FALSE] #Returns error

The latter command returns 'Error in Math.factor(j) : abs not meaningful for factors'. I found this when attempting to get tst[,lapply(cols, as.factor),with=FALSE] where cols was a collection of rows I was attempting to convert to factors. Is there any solution or workaround for this?

tresbot
  • 1,570
  • 2
  • 15
  • 19
  • 3
    +1 I've added: [Gracefully catch internal abs() error on j when with=FALSE but j is wrongly factor](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4867&group_id=240&atid=978) – Matt Dowle Aug 30 '13 at 09:36

2 Answers2

36

I found one solution:

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
cols <- 'a'
tst[,(cols):=lapply(.SD, as.factor),.SDcols=cols]

Still, the earlier-mentioned behavior seems buggy.

tresbot
  • 1,570
  • 2
  • 15
  • 19
  • You were trying to index the data.table with a factor - factors are neither characters nor numerics (they are categorical values with no clear magnitude), so data.table spits an error. – thelatemail Aug 30 '13 at 06:32
  • 1
    Also `tst[,as.factor(a)]` is just returning `as.factor(tst$a)` and is not indexing the data.table at all. Try `tst[,1:5]` to see what I mean. – thelatemail Aug 30 '13 at 06:33
  • 1
    you can try `tst[, a := as.factor(a)]` if you've just one column or do what you've shown or also use `set` within a for-loop over each of the columns. – Arun Aug 30 '13 at 06:41
  • 1
    Your error is because, you use `with=FALSE` and there are only a few possibilities for `j`. And `data.table` figures it out by checking if `j` is *logical* or *character* etc... and then comes to checking if they are column numbers.. and therefore checks `if (abs(j) > ncol(.))` where `j` is `factor(a)`. Here you call `abs` on a factor... – Arun Aug 30 '13 at 06:42
  • I was originally trying to change the type of multiple columns quickly with `cols:=as.factor(cols)`. Is Arun's suggestion of set within a for loop preferred/faster? – tresbot Aug 30 '13 at 23:14
  • 1
    @tresbot, have a look at ?set – Arun Sep 02 '13 at 19:52
4

This is now fixed in v1.8.11, but probably not in the way you'd hoped for. From NEWS:

FR #4867 is now implemented. DT[, as.factor('x'), with=FALSE] where x is a column in DT, is now equivalent to DT[, "x", with=FALSE] instead of ending up with an error. Thanks to tresbot for reporting on SO: Converting multiple data.table columns to factors in R


Some explanation: The difference, when with=FALSE is used, is that the columns of the data.table aren't seen as variables anymore. That is:

tst[, as.factor(a), with=FALSE] # would give "a" not found!

would result in an error "a" not found. But what you do instead is:

tst[, as.factor('a'), with=FALSE]

You're in fact creating a factor "a" with level="a" and asking to subset that column. This doesn't really make much sense. Take the case of data.frames:

DF <- data.frame(x=1:5, y=6:10)
DF[, c("x", "y")] # gives back DF

DF[, factor(c("x", "y"))] # gives back DF again, not factor columns
DF[, factor(c("x", "x"))] # gives back two columns of "x", still integer, not factor!

So, basically, what you're applying a factor on, when you use with=FALSE is not on the elements of that column, but just that column name... I hope I've managed to convey the difference well. Feel free to edit/comment if there are any confusions.

Community
  • 1
  • 1
Arun
  • 116,683
  • 26
  • 284
  • 387