0

I have the following data.frame called training:

event.5    er     her2   lymph   grade
TRUE       TRUE   FALSE  FALSE   3
FALSE      FALSE  TRUE   FALSE   3
...

I would like to convert all columns in factors using:

training <- do.call(as.factor, training)

But I get the following error:

Error in (function (x):
unused arguments (event.5 = c (TRUE, FALSE,...)

I can manually convert each column as factor but I want something more elegant. I would greatly appreciate any suggestion. Thank you!

Johnathan
  • 1,877
  • 4
  • 23
  • 29

3 Answers3

2

I think it would be most useful to explain the error message, since @nicola has already presented the "correct answer". The difference between do.call and lapply :

do.call: The second argument to do.call are matched to the formal named arguments of the first argument (the function). So the 'event.5' item is offered to as.factor and the interpreter cannot find any item in the formal parameters of as.factor that match, so that generates an error. `do.call does not have an ellipsis in its formals list.

lapply: The elements of the first argument are passed one-by-one (and unnamed) to the function. There is an optional ellipsis mechanism that allows further arguments to be offered but they are offered as a whole rather than one by one. Those arguments must be named and not partial matching of names occurs. The named arguments might even include the first of the functions formals, so it is the first argument in the formals that gets matched to the values coming in from the X argument to lapply. If you want to pass multiple lists in a one by one fashion, then look at mapply.

@nicola's solution also put the "[]" on the LHS so that []<- is used rather than just <-. This has the effect of preserving the data.frame structure.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

Since the voting is being reopened, I'd just add another way of explaining the difference between do.call and lapply to complement what @BondedDust has written.

Both do.call and lapply take a function and a list as arguments (even if in different order). But the difference is huge.

Writing

do.call(fun,list)

basically is the same of:

fun(list[[1]],list[[2]], ... , list[[length(list)]])

You are calling fun just once and the list are the arguments of fun.

For lapply:

lapply(list,fun)

is roughly equivalent to:

list(fun(list[[1]]),fun(list[[2]]), ... , fun(list[[length(list)]]))

You call fun as many times as the length of list and store the results in a list.

Hope this clarifies a bit.

nicola
  • 24,005
  • 3
  • 35
  • 56
0

You can do this:

df <- data.frame(event.5=c(T,F), er=c(T,F), her2=c(F,T), lymph=c(F,F), grade=c(3,3) )
df <- as.data.frame(lapply(df, as.factor ))
df
##   event.5    er  her2 lymph grade
## 1    TRUE  TRUE FALSE FALSE     3
## 2   FALSE FALSE  TRUE FALSE     3
bgoldst
  • 34,190
  • 6
  • 38
  • 64