0

I have a dataframe that contains a variable "nameDay", which is a factor variable. The days are represented as characters ("Saturday", "Monday"...), but I have converted them to factors. Here are the top 6 rows from this dataframe for reproduction:

head(Casual.data) 

casual    casAvg Year weather season holiday humidity   medWs  nameDay
minTemp   avgHum    stdWs Hour derHum  atemp Day 3131      61
43.907692 2011       1      3       0       42 11.0014  Tuesday   31.16 54.77778 5.544601   16 -3.500 42.425  19 8581       5  1.369231 2012       2      3       0       70  6.0032 Thursday   29.52 65.55556
3.282332    5  1.000 34.090  19 4452      40 34.153846 2011       1      4       0       77  7.0015   Monday   21.32 57.77778 5.598605   20 
4.625 25.000  17 9610       1  2.828125 2012       1      4       0       73  7.0015   Friday   22.14 62.77778 3.206137    2  2.000 25.760   5
10235      1  1.421875 2012       1      4       1       76 11.0014  
Monday   16.40 71.77778 2.962030    4  1.750 20.455  12 496        0 
2.828125 2011       2      1       0       63  6.0032   Friday    5.74 49.55556 3.951886    2  2.875  8.335   4
      maxAtemp maxTemp   stdTemp  stdAtemp  derAtemp derTemp 3131    42.425   36.90 1.7608268 1.7536814  0.757500  0.7175 8581    35.605   31.16 0.7609278 0.7030059 -0.189375 -0.2050 4452    27.275   23.78 0.7609278 0.7033802 -0.189375 -0.2050 9610    31.060   27.06 2.0085816 2.4278610 -0.662500 -0.6150 10235   21.970   18.04 0.6833333 0.6310012 -0.189375 -0.2050 496     12.880    8.20 0.8961833 1.3659498 -0.283750 -0.3075

The function cv.glmnet (from library glmnet) requires that I pass my data as a matrix, and not a dataframe. Therefore, I convert my dataframe into a matrix:

Xcas <- as.matrix(Casual.data[,-1])

I take out the first column because it is my response variable. I create a numeric vector for my response vector:

Ycas <- as.numeric(Casual.data$casual)

Finally, I attempt to fit the lasso regression model:

lasso.casual   <- cv.glmnet(x=Xcas, y=Ycas, alpha=1)

I get this error message:

Error in elnet(x, is.sparse, ix, jx, y, weights, offset,
type.gaussian, : NA/NaN/Inf in foreign function call (arg 5) In
addition: Warning message: In elnet(x, is.sparse, ix, jx, y, weights,
offset, type.gaussian, : NAs introduced by coercion

I think it is because of the "nameDay" variable in my original dataframe, but I'm not sure. Any ideas on how to fix this?

Thanks

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
JonGor
  • 67
  • 1
  • 9
  • [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – zx8754 May 08 '15 at 13:17

2 Answers2

2

This is an old question but I'm gonna type a quick response anyway for others who might stumble upon this.

Like sqluser noted, encoding your weekdays as numerical values will effectively assign the seventh day of the week (Sunday in Europe) seven times as much value as as the first - not desirable as day of the week is an ordinal variable. Instead, you should represent its factor levels as seven separate columns containing dummy variables (effectively 1's) for each of the weekdays. To easily do this, have a look at the stats::model.matrix function.

Extrapolator
  • 343
  • 3
  • 7
1

You have factors in the original data.frame ("nameDay").

I'm assuming that when you convert this to a matrix they get converted into characters, and since matrices can only have one type of variables, the rest of your numeric columns get converted to characters as well.

Have you actually checked your matrix after the conversion? I bet that's why you are getting NAs.

As far as being forced to convert the df to a matrix due to the nature of glmnet, I would suggest to convert them to numeric first (1,2,3...7) before converting the df to a matrix.

I don't know your data, but depending on wether there is a logical relationship between the components of the nameDay variable (Days so there should be), converting them to numeric 1-7 may or may not have undesirable repercussions in your model.

sqluser
  • 5,502
  • 7
  • 36
  • 50