0

In this tutorial, I tried to use another method for converting categorical variables to factor.

In the article, the following method is used.

library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)

and I replaced the last line by

birthwt[cols] <- as.factor((birthwt[cols]))

but the result is NA all

enter image description here

What is wrong with that?

Eilia
  • 11
  • 1
  • 3
  • 17

1 Answers1

2

as.factor((birthwt[cols])) is calling as.factor on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:

> as.factor(birthwt[cols])
  low  race smoke    ht    ui 
 <NA>  <NA>  <NA>  <NA>  <NA> 
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low"   "race"  "smoke" "ht"    "ui" 

lapply iterates over a list, calling the function as.factor on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols]) does.

henryn
  • 1,163
  • 4
  • 15
  • Sorry I got in just ahead of you [on this question](https://stackoverflow.com/a/64994135/903061). Nice answer here, good explanation! – Gregor Thomas Nov 24 '20 at 20:30