I have pretty large dataframe -- about 235K rows and I want to do multivariate regression:
model <- lm(var~., data=data)
but I get an error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf в 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
Neither na.omit
, nor other methods of getting rid of NA's didn't help.
So I've tried to find NA by myself. I've split dataframe into two parts:
Second UPD
data1 <- data[1:(dim(data)[1]/2), ]
data2 <- data[(dim(data)[1]/2):(dim(data)[1]), ]
and I again get result for both lm
and no errors from previous UPD section! NB: I've restarted RStudio.
First UPD
data1 <- data[1:(dim(data)[1]/2),]
and when I call lm
instead of previous error I get next:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
To reach this error I reduced data from 235K to 14.5K. So, what is the problem now? Some of offcasted slices don't throw any errors.
Origin version
data1 <- data[1:(dim(data)[1]/2)]
data2 <- data[(dim(data)[1]/2):(dim(data)[1])]
and call lm
for each of them:
model1 <- lm(var~., data=data1)
model2 <- lm(var~., data=data2)
and I reciece no errors! So, I suppose problem is in big size of dataframe. Is there any way to fix it?