Storing lm objects within a data table (In order to use predict)

Question

Following some great questions like this one: Why is using update on a lm inside a grouped data.table losing its model data?, I'm running regression within a data.table and storing it, as the following:

DT = data.table(iris)
fit = DT[, list(list(lm(Sepal.Length ~ Sepal.Width + Petal.Length))), by = Species]

However, I would like to store the .J output as lm object lm output, and not as a data.table:

class(fit[Species=="setosa"])
#i would like fit to contain 3 lm objects, not data.tables!
# [1] "data.table" "data.frame"

My question is, how can I store within fit 3 lm objects and not 3 data tables, the reason I need that, is that I want to further use fit for out sample prediction (using predict.lm)?

For example, I would like to store within the data table an element of the following type:

model<-lm(Sepal.Length ~ Sepal.Width + Petal.Length,data=DT[Species=="setosa"])
class(model)
# [1] "lm"
#i would like the first element of fit to inclide model -> the model output object
new_data<-DT #just a toy example :) this isnt really the new data 
predict(model,new_data)

The value that is stored in `V1` is **not** a `data.table`, rather a `list` that indeed stores `lm` objects. You should do `class(fit[Species=="setosa", V1])` instead of just `class(fit[Species=="setosa"])`, because the later checks the whole data set, rather just `V1`. Also, you can check what `V1` stores by simply doing `fit[, lapply(V1, class)]`. Finally, you can easily use `predict` on the values in `V1`. Just do `fit[, lapply(V1, predict, new_data)]` — David Arenburg, Aug 03 '16 at 10:11
I don't understand why you use `data.table` for the fits. Just use `nlme::lmList(Sepal.Length ~ Sepal.Width + Petal.Length | Species, data = iris)`. — Roland, Aug 03 '16 at 10:14
@Roland , eventually i want to iterate over many subgroups, currently im doing it within a for loop, hopefully data.table would do it faster — Yehoshaphat Schellekens, Aug 03 '16 at 10:16
I don't see any speed advantage in using data.table here. The slow part is not the grouping. The time is mostly spent in `lm`. To really get better performance, you should use `lm.fit` (in combination with data.table), but then you lose of course the convenience of using `summary`, `predict` etc. and have to code these yourself. PS: Learn to profile your code. — Roland, Aug 03 '16 at 10:19
@Roland thanks for the profiling tip, I'm using for years R, and this is the first time I'm hearing about this :) — Yehoshaphat Schellekens, Aug 03 '16 at 11:31
Continuing on Roland's point, for this type of exercise, you might consider a parallel `lapply` followed by `rbindlist`. Send different keyed subgroups to different threads. — MichaelChirico, Aug 03 '16 at 17:56
@MichaelChirico can you show me an example, i would love to parallel that, ill consider it as a valid answer :) — Yehoshaphat Schellekens, Aug 04 '16 at 06:42

Storing lm objects within a data table (In order to use predict)

0 Answers0