4

I want to write a generic script to find the information gain of a set of features with respect to the final column. For instance, in a data frame built from a matrix with 26 columns, I'd write:

information.gain(V26~.,table)

The problem is that the formula V26~. doesn't have an obvious generic form. My first thought was to try this:

> nms <- colnames(table)
> nms[length(nms)]
[1] "V26"
> information.gain(nms[length(nms)]~., table)
Error in model.frame.default(formula, data, na.action = NULL) : 
  variable lengths differ (found for 'V1')

which seemed wrong on account of nms being a vector of strings. Is there a way to coerce the name into something that can be part of a formula?

John Doucette
  • 4,370
  • 5
  • 37
  • 61
  • 3
    `paste` the formula together and then use `as.formula`. – joran Jul 22 '13 at 22:43
  • Now I just feel silly. Indeed, paste and as.formula. Thanks. – John Doucette Jul 22 '13 at 22:46
  • @JohnDoucette I have an example of this usage in this [Q&A](http://stackoverflow.com/a/17794862/429846) from earlier today. – Gavin Simpson Jul 22 '13 at 22:55
  • I've posted an Answer here as the other Q&A is not exactly the same. Also do not that the question of referring to the last thing or element of an object has come up a lot here. Don't be surprised if this ends up being closed as a result. – Gavin Simpson Jul 22 '13 at 23:05

2 Answers2

6

Here is a simple solution, using dummy data

DF <- data.frame(matrix(runif(260), ncol = 26))
names(DF) <- paste0("V", seq_len(ncol(DF)))

Here I employ tail() to select the name of the last column in DF and build the formula from there.

f <- as.formula(paste(tail(names(DF), 1), "~ ."))

> f
V26 ~ .
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
0

Modified to fit the question. You could place the last column of the data frame into a separate vector and then associate it in your function. For example, here is a solution using the number of columns:

last_col <- df[,ncol(df)]

function(last_col ~ ., blah, blah, etc)

Hope that helps!

RandallShanePhD
  • 5,406
  • 2
  • 20
  • 30
  • Although not responding to the question the question asks, this is to have an object resulting from a formula with the embeded object colum. Your question answers another question which is how to extract the content of the last column of a data frame. I would suggest to write a question and you can provide the answer as I could not find this question anywhere (maybe is somewhere but could not find it) – Barnaby Feb 09 '15 at 11:51