0

I'm try to do a OLS regression on my dataset but im getting an Error i see the first time. Im not sure how i can solve it. I tried already some workarounds from other boards but nothing worked so far.

Some more input

str(X.mat.train)
chr [1:80000, 1:42] "36" "60" "60" "36" "60" "60" "36" "36" "60" "36" "60" "36" "36" "36" "36" "60" "36" "36" "36" "36" "36" "36" "60" ...
- attr(*, "dimnames")=List of 2
 ..$ : chr [1:80000] "855655" "712944" "629936" "264278" ...
 ..$ : chr [1:42] "term" "installment" "grade" "emp_length" ...
str(y.train)
num [1:80000] 12.99 19.2 11.99 9.67 15.61 ..

there are no NA is this dataset and the data of X.ma.train has 80.000 rows but im still confused were the number 3360000 is coming from.

mod.lm.summary <- summary(lm(y.train~X.mat.train-1))

Error in `[[<-.data.frame`(`*tmp*`, i, value = c(35809L, 35828L, 35828L,  : 
   replacement has 3360000 rows data has 80000
  • 1
    Welcome to Stack Overflow! Help others help you by providing a [minimal, complete, verifiable example](https://stackoverflow.com/help/mcve). You can check out [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for some tips and more details. In particular, I suspect others could help you much more easily with more information about your data and/or access to (a subset of) it – duckmayr Feb 09 '19 at 14:25
  • 1
    It's saying 336000 rows because it is reading the data frame as a vector `8000 * 42 = 336000`. Maybe try `lm(y.train ~ ., data = X.mat.train)`. – Rex Feb 09 '19 at 17:26
  • Hey Rex, yes its true. i think the problem was that my X.mat.train was still a chr and not num. So the linear model tried to replace probably something. But thanks for the explanination of the 336000! – Laurence Bach Feb 12 '19 at 21:41

0 Answers0