3

I am trying to make a section of code more flexible by referencing the data frame column names and inserting them into an equation, rather than calling the names directly. The following example works, although I have to insert the field name directly:

require(e1071)

class = c(0.25, 0.34, 0.55)
field1 = c(23, 33, 34)
field2 = c(44, 55, 32)

df = data.frame(class, field1, field2)

mysvm = svm(class ~ field1 + field2, data = df)

The following example does not work, and I do not know why:

require(e1071)

class = c(0.25, 0.34, 0.55)
field1 = c(23, 33, 34)
field2 = c(44, 55, 32)

df = data.frame(class, field1, field2)

name1 = names(df)[2]
name2 = names(df)[3]

mysvm = svm(class ~ name1 + name2, data = df)

How can I reference the 2nd and 3rd columns in a dataframe and properly insert them into an equation?

Borealis
  • 8,044
  • 17
  • 64
  • 112

5 Answers5

3

The variable name1 contains a character string that is equal to names(df)[2], let's say it's "foo". When svm receives a formula object with the term name1, it searches for an object named name1 and replaces that object with its value. That is, svm is trying to "regress" the variable class on the length-one character vector "foo", which of course doesn't make sense.

One workaround here is to create the formula as a character string, and then convert it to a formula after the fact. Here's a utility function I use from time to time:

xyform <- function (y_var, x_vars) {
# y_var: a length-one character vector
# x_vars: a character vector of object names
    as.formula(sprintf("%s ~ %s", y_var, paste(x_vars, collapse = " + ")))
}
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
2

I'm not sure if you care how the formula reads in the call output, but to evaluate it you can do

> foo <- function(n1, n2) {
      as.formula(paste("class~", paste(n1, n2, sep = "+")))
  }
> foo(name1, name2)
# class ~ field1 + field2
# <environment: 0x4d0da58>
> svm(foo(name1, name2), data = df)
#
# Call:
# svm(formula = foo(name1, name2), data = df)
#
#    
# Parameters:
#    SVM-Type:  eps-regression 
#  SVM-Kernel:  radial 
#        cost:  1 
#       gamma:  0.5 
#     epsilon:  0.1 
#
# Number of Support Vectors:  3
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
2

Here 2 options:

Either you subset your data.frame, by the column names given as a parameter and you use the dot notation for the left term of your formula:

svm_func <- function(ll=c("field1","field1"),xx=df){
  print(df[,c("class",ll)])
  svm(class ~ ., data = df[,c("class",ll)])
}

Or you use the forumla version of svm , similar to other solutions but here I am using do.call to generalize formula creation to a any number of parameter:

svm_func_form <- function(ll=list("field1","field1"),xx=df){
  left_term <- do.call(paste,list(ll,collapse="+"))
  form <- as.formula(paste("class",left_term,sep="~"))
  svm(formula =form,data =xx)
}
agstudy
  • 119,832
  • 17
  • 199
  • 261
2

Here are some ways to pass variables by reference and insert it into Call formula. The first line is copied from @Richard Scriven's function

 fun1 <- function(n1, n2){
 form1 <- as.formula(paste("class~", paste(n1, n2, sep = "+")))
 do.call("svm", list(form1, quote(df)))
 }    

 fun1(name1, name2)

 #Call:
 #svm(formula = class ~ field1 + field2, data = df)


 #Parameters:
 # SVM-Type:  eps-regression 
 # SVM-Kernel:  radial 
 #  cost:  1 
 # gamma:  0.5 
 # epsilon:  0.1 


 #Number of Support Vectors:  3

Or

 fun2 <- function(n1, n2){
 form1 <- as.formula(paste("class~", paste(n1, n2, sep="+")))
 eval(substitute(svm(f, df), list(f = form1)))
 }  

 fun2(name1, name2)

 #Call:
 #svm(formula = class ~ field1 + field2, data = df)


 #Parameters:
 # SVM-Type:  eps-regression 
 # SVM-Kernel:  radial 
 #  cost:  1 
 # gamma:  0.5 
 #  epsilon:  0.1 


 #Number of Support Vectors:  3

Or you could pass @Rchard Scriven's function as an argument in fun3

 fun2New <- function(n1, n2){
  as.formula(paste("class~", paste(n1, n2, sep="+")))
  }



 fun3 <- function(formula, data, ...){
 Call <- match.call(expand.dots = TRUE)
 Call[[1]] <- as.name("svm")
 Call$formula <- as.formula(terms(formula))
 eval(Call)
 }

 fun3(fun2New(name1, name2), df)

 #Call:
 #svm(formula = class ~ field1 + field2, data = df)


 #Parameters:
 # SVM-Type:  eps-regression 
 # SVM-Kernel:  radial 
 #  cost:  1 
 # gamma:  0.5 
 # epsilon:  0.1 


 #Number of Support Vectors:  3
akrun
  • 874,273
  • 37
  • 540
  • 662
  • How is `fun2New` any different from my function `foo`? – Rich Scriven Sep 21 '14 at 14:57
  • @Richard Scriven I used `fun2New` as an argument for `fun3` to change the formula in `Call` statement. I should have put that function inside `fun3`. But, somehow not getting it correct. Anyway, the OP seems to not want that kind of function. So, I am not working on it. – akrun Sep 21 '14 at 15:45
  • But did you copy it from Richard's answer without attribution? – Matthew Lundberg Sep 21 '14 at 16:05
  • @Matthew Lundberg I edited it. Sorry, I didn't thought about it at that time. Is it okay now? – akrun Sep 21 '14 at 16:07
1

Use your own code, just use get(name1) instead of name1 !

> mysvm = svm(class ~ get(name1) + get(name2), data = df)
> mysvm

Call:
svm(formula = class ~ get(name1) + get(name2), data = df)


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.5 
    epsilon:  0.1 


Number of Support Vectors:  3
rnso
  • 23,686
  • 25
  • 112
  • 234