6

I am attempting to carry out lasso regression using the lars package but can not seem to get the lars bit to work. I have inputted code:

diabetes<-read.table("diabetes.txt", header=TRUE)
diabetes
library(lars)
diabetes.lasso = lars(diabetes$x, diabetes$y, type = "lasso")

However, I get an error message of : Error in rep(1, n) : invalid 'times' argument.

I have tried entering it like this:

diabetes<-read.table("diabetes.txt", header=TRUE)
library(lars)
data(diabetes)
diabetes.lasso = lars(age+sex+bmi+map+td+ldl+hdl+tch+ltg+glu, y, type = "lasso")

But then I get the error message: 'Error in lars(age+sex + bmi + map + td + ldl + hdl + tch + ltg + glu, y, type = "lasso") : object 'age' not found'

Where am I going wrong?

EDIT: Data - as below but with another 5 columns.

             ldl          hdl          tch          ltg          glu
1   -0.034820763 -0.043400846 -0.002592262  0.019908421 -0.017646125
2   -0.019163340  0.074411564 -0.039493383 -0.068329744 -0.092204050
3   -0.034194466 -0.032355932 -0.002592262  0.002863771 -0.025930339
4    0.024990593 -0.036037570  0.034308859  0.022692023 -0.009361911
5    0.015596140  0.008142084 -0.002592262 -0.031991445 -0.046640874
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
math11
  • 537
  • 2
  • 6
  • 8
  • 1
    Can you post the result of `dput(diabetes)` in the first section (to make it reproducible)? – David Robinson Jan 07 '13 at 19:20
  • 5
    In `lars()` `x` is supposed to be a matrix and `y` a vector. Your data are unlikely to meet the requirements for at least `x` as that will be a single vector of some description if the code you show is accurate. – Gavin Simpson Jan 07 '13 at 19:24
  • My data is a txt file with about 500 samples of data for 10 variables. How can I edit the code to make it work? I've tried the 2 above but it won't work. – math11 Jan 07 '13 at 19:31
  • show us your data, a few lines will be enought – Gago-Silva Jan 07 '13 at 19:46
  • I've added my data to the original post now – math11 Jan 07 '13 at 19:58
  • I still think it might be ambiguous what your data object looks like. You might have several now. I suspect that executing `data(diabetes)` would overwrite an existing 'diabetes' object that you might have read in from that .txt file. The 'lars' package has a built-in version of 'diabetes' – IRTFM Jan 07 '13 at 20:06

2 Answers2

4

lars::lars does not appear to have a formula interface, which means you cannot use the formula specification for the column names (and furthermore it does not accept a "data=" argument). For more information on this and other "data mining" topics, you might want to get a copy of the classic text: "Elements of Statistical Learning". Try this:

# this obviously assumes require(lars) and data(diabetes) have been executed.
> diabetes.lasso = with( diabetes, lars(x, y, type = "lasso"))
> summary(diabetes.lasso)
LARS/LASSO
Call: lars(x = x, y = y, type = "lasso")
   Df     Rss       Cp
0   1 2621009 453.7263
1   2 2510465 418.0322
2   3 1700369 143.8012
3   4 1527165  86.7411
4   5 1365734  33.6957
5   6 1324118  21.5052
6   7 1308932  18.3270
7   8 1275355   8.8775
8   9 1270233   9.1311
9  10 1269390  10.8435
10 11 1264977  11.3390
11 10 1264765   9.2668
12 11 1263983  11.0000
IRTFM
  • 258,963
  • 21
  • 364
  • 487
4

I think some of the confusion may have to do with the fact that the diabetes data set that comes with the lars package has an unusual structure.

library(lars)
data(diabetes)
sapply(diabetes,class)
##        x         y        x2 
##   "AsIs" "numeric"    "AsIs" 

sapply(diabetes,dim)
## $x
## [1] 442  10
## 
## $y
## NULL
## 
## $x2
## [1] 442  64

In other words, diabetes is a data frame containing "columns" which are themselves matrices. In this case, with(diabetes,lars(x,y,type="lasso")) or lars(diabetes$x,diabetes$y,type="lasso") work fine. (But just lars(x,y,type="lasso") won't, because R doesn't know to look for the x and y variables within the diabetes data frame.)

However, if you are reading in your own data, you'll have to separate the response variable and the predictor matrix yourself, something like

X <- as.matrix(mydiabetes[names(mydiabetes)!="y",])
mydiabetes.lasso = lars(X, mydiabetes$y, type = "lasso")

Or you might be able to use

X <- model.matrix(y~.,data=mydiabetes)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 'diabetes' is one of the datasets that come with `pkg:lars` (as was suggested by the OP's use of `data(diabetes)`. – IRTFM Jan 07 '13 at 19:58
  • I've tried the above and I get: Error in lars(x, y, type = "lasso") : object 'x' not found – math11 Jan 07 '13 at 20:02
  • The diabetes object is a list whose 'x' element is a matrix. `colnames(diabetes$x) [1] "age" "sex" "bmi" "map" "tc" "ldl" "hdl" "tch" "ltg" "glu"` – IRTFM Jan 07 '13 at 20:08
  • actually it's a matrix ... at least that's what the interpreter thinks: `is.matrix(diabetes$x) [1] TRUE` ... and ... `is.data.frame(diabetes$x) [1] FALSE` – IRTFM Jan 07 '13 at 20:13
  • sorry, I meant that `diabetes` is a data frame, not a list (which IMO makes things a little more confusing). I agree that `diabetes$x` is a matrix. – Ben Bolker Jan 07 '13 at 20:14