I have a dataframe with two columns
var_1<-seq(1:252)
var_2<-runif(1:252)*1000
my_new_df<-data.frame(var_1,var_2)
names(my_new_df)<-c("Time_values","Count")
train_poly_data<-my_new_df[1:150,c("Time_values","Count")] # training data set
valid_poly_data<-my_new_df[151:200,c("Time_values","Count")] # validation data set
test_poly_data<-my_new_df[201:252,c("Time_values","Count")] # test data set
#obtain a polymomial regression model with 20 Degrees
poly_tr<-lm(train_poly_data$Count ~ poly(train_poly_data$Time_values,degree=20,raw = TRUE))
summary(poly_tr)
#getting the following warnings
Warning messages:
1: 'newdata' had 50 rows but variables found have 150 rows
2: In predict.lm(poly_tr, valid_poly_data) :
prediction from a rank-deficient fit may be misleading
Here is what I need to do,
I need to split data frame in train, validation, test data sets Next I want to use polynomial regression using the training data and validate it using the validation data
But I keep on getting the error, how would I resolve the issue, I am also interested in finding the optimal degree of the polynomial as I want to see whether the randomly picked polynomial degree of 20 is kinda correct?
Any suggestions or help to point out my mistake will be always welcome.
How do I fix this warning ? I do understand that the warning is thrown because we have 150 values in training data set and 50 in validation data set