I'm building a regression model with several date and numeric variables. I do a quick check on one of the date variables
lm.fit = lm(label ~ Firstday, data = rawdata)
summary(lm.fit)$r.squared
to gauge its predictive influence on the model. This accounted for 41% of the variance. I now attempted to change the date to numeric so I can work better with the variable. I used the command
as.numeric(as.POSIXct(rawdata$Firstday, format = "%Y-%m-%d"))
Doing this reduced the variance to 10% - which is not what I want. What am I doing wrong and how do I go about it?
I've looked at https://stats.stackexchange.com/questions/65900/does-it-make-sense-to-use-a-date-variable-in-a-regression but the answer is not clear to me.
Edit 1:
A reproducible code sample of what I did is shown below:
label = c(0,1,0,0,0,1,1)
Firstday = c("2016-04-06", "2016-04-05", "2016-04-04",
"2016-04-03", "2016-04-02", "2016-04-02","2016-04-01")
lm.fit <- lm(label ~ Firstday)
summary(lm.fit)$r.squared
[1] 0.7083333
On changing to numeric:
Firstday = as.numeric(as.POSIXct(Firstday, format="%Y-%m-%d"))
I now get
lm.fit <- lm(label ~ Firstday)
summary(lm.fit)$r.squared
[1] 0.1035539