7

I'm using the caret library in R for model generation. I want to generate an earth (aka MARS) model and I want to specify the degree parameter for this model generation. According to the documentation (page 11) the earth method supports this parameter.

I get the following error message when specifying the parameter:

> library(caret)
> data(trees)
> train(Volume~Girth+Height, data=trees, method='earth', degree=1)
Error in { : 
  task 1 failed - "formal argument "degree" matched by multiple actual arguments"

How can I avoid this error when specifying the degree parameter?

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] earth_3.2-3    plotrix_3.4    plotmo_1.3-1   leaps_2.9      caret_5.15-023
 [6] foreach_1.4.0  cluster_1.14.2 reshape_0.8.4  plyr_1.7.1     lattice_0.20-6

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
[5] tools_2.15.0   
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
theomega
  • 31,591
  • 21
  • 89
  • 127

2 Answers2

12

I have always found the functions in caret both useful and somewhat maddening. Here's what's going on.

You're attempting to pass an argument to earth via the ... argument to train. The documentation for train contains this description for that argument:

arguments passed to the classification or regression routine (such as randomForest). Errors will occur if values for tuning parameters are passed here.

Tuning parameter, eh? Well, if you scroll down and examine the official list of tuning parameters for each model type, you'll see that for earth, they are degree and nprune.

So the issue here is that train is designed to automate some grid searching along tuning parameters, and the ... argument is to be used for passing further arguments to the model fitting function except for those tuning parameters.

If you want to set the tuning parameters you'll need to use other arguments, like so:

train(Volume~Girth+Height, data=trees, method='earth',
      tuneGrid = data.frame(.degree = 1,.nprune = 5))

Note how the columns are named with leading periods. Also, it is frustrating that since the default value in earth for nprune is NULL, I'm not sure you can pass only the default values in this way. (Generally, setting things to NULL in data frames will simply remove them.)

joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    Thanks for the solution. The problem is: How can I keep the 'nprune' to its default value? I looked it up in the source and found out that the default-value is calucated using private (non-callable) function `marsSeq` so it is not a fixed value. As you told, you can not leave it out. – theomega May 08 '12 at 19:45
  • See my answer below which provides a solution to my (and your) question. – theomega May 17 '12 at 17:13
  • Is the dot infront of the parameter names really necessary? For me, it works with and without it. – Antoine Feb 25 '20 at 23:37
9

I found out how to do it, joran led me into the right direction:

Create a new function which generates the training grid. This function must accept the two parameters len and data. In order to retrieve the original training grid, you can call the createGrid method provided by the caret package. You can then modify the grid to your needs. For example to neave the nprune parameter unchanged and add degree from 1 to 5 use the following code:

  createMARSGrid <- function(len, data) {
      g = createGrid("earth", len, data)
      g = expand.grid(.nprune=g$.nprune, .degree=seq(1,5))
      return(g)
  }   

Then invoke it like this:

train(formula, data=data, method='earth', tuneGrid = createMARSGrid)
theomega
  • 31,591
  • 21
  • 89
  • 127
  • Thanks @theomega. This is very helpful. Do you know what the connection between `len` in `createGrid` and `tuneLength` (argument to `train`) is? Also, why does `createGrid` need to receive the data for certain models? (`?createGrid` does not say much on this) – Amelio Vazquez-Reina Feb 12 '13 at 18:33
  • Please ask a new question and I'll be happy to help you – theomega Feb 12 '13 at 21:16
  • Thanks theomega: Here is my question: http://stackoverflow.com/questions/14839730/caret-errors-with-creategrid-for-rf-randomforest. I am still missing an answer clarifying the connection between `tuneLength` and the len parameter in `createGrid`, e.g. can they be used together? what is their relationship? – Amelio Vazquez-Reina Feb 12 '13 at 21:18