Error in converting categorical variables to factor in R

Question

In this tutorial, I tried to use another method for converting categorical variables to factor.

In the article, the following method is used.

library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)

and I replaced the last line by

birthwt[cols] <- as.factor((birthwt[cols]))

but the result is NA all

What is wrong with that?

score 2 · Accepted Answer · answered Nov 24 '20 at 16:58

as.factor((birthwt[cols])) is calling as.factor on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:

> as.factor(birthwt[cols])
  low  race smoke    ht    ui 
 <NA>  <NA>  <NA>  <NA>  <NA> 
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low"   "race"  "smoke" "ht"    "ui"

lapply iterates over a list, calling the function as.factor on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols]) does.

Sorry I got in just ahead of you [on this question](https://stackoverflow.com/a/64994135/903061). Nice answer here, good explanation! — Gregor Thomas, Nov 24 '20 at 20:30

Error in converting categorical variables to factor in R

1 Answers1