0

I'm trying to run a lasso regression on my large dataset but I keep obtaining the following error messages:

**Error in if (is.null(np) | (np[2] <= 1)) stop("x should be a matrix with 2 or more columns") : 
  argument is of length zero**

**Error in elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian,  : 
  (list) object cannot be coerced to type 'double'**

My dataset is information on a travel index (GTI) for determining 'safe' LGBT traveling. I'm trying to use the other variables in the dataset to fit a model to and predict the GTI. Here is the code I have used thus far:

gaydata <- read.csv(file = 'GayData.csv')

sample of data and headers

names(gaydata)[names(gaydata) == "Total"] <- "GTI"
lasso_1 = glmnet(GTI ~ Anti.Discrimination.Legislation + Marriage.Civil.Partnership + Adoption.Allowed +
                           Transgender.Rights + Intersex.3rd.Option + Equal.Age.of.Consent +
                           X.Conversion.Therapy + LGBT.Marketing + Religious.Influence + 
                           HIV.Travel.Restrictions + Anti.Gay.Laws + Homosexuality.Illegal +
                           Pride.Banned + Locals.Hostile + Prosecution + Murders + Death.Sentences, data = gaydata)

OR

lasso_2 = glmnet(x=gaydata, y=gaydata$GTI, alpha=1)

Removing 'Country' since it is categorical data that may be causing an issue

gaydata = subset(gaydata, select = -Country)

Trying to identify what is causing "argument is of length zero" error

sapply(gaydata, is.null)

sapply(gaydata, is.factor)

sum(is.null(gaydata))

In my research in trying to find a solution to this issue, I've seen that nulls, incorrect column names, and issues with factor variables typically cause the error. However, my data does not have those problems so I'm lost. My data is a copy and paste from the

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
  • 2
    Please provide a sample as a _data_, not picture. Apply `dput()` to something like random 10 or first 10 rows, for example `dput(gaydata[1:10,])` and share the output here. – Vasily A Nov 17 '20 at 01:29
  • 1
    See resources on [How to make a reproducible R example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Vasily A Nov 17 '20 at 01:32
  • When you use the `x = gaydata` approach, remove the response column from `x` as well as the country column. `glmnet(x=gaydata[-c(1, 3)], y=gaydata$GTI, alpha=1)`. (Maybe remove `rank` too?) – Gregor Thomas Nov 17 '20 at 02:18
  • And `glmnet` doesn't have a formula method, so that won't work. – Gregor Thomas Nov 17 '20 at 02:19

1 Answers1

1

Just figured it out with the help of a statistician:

Apparently I needed to change my dataset into a matrix

gaydata = as.matrix(gaydata)

and use the following format

lasso_0 = glmnet(y=gaydata[,2], x=gaydata[,-2])