lm fitting multiple subsets of a data.frame

Question

I am running some simulation where I want to fit a linear model to subsets of my data:

library(reshape2)
library(plyr)

all <- mutate(iris, mean_width = ave(Petal.Width, Petal.Length))
str(all)

## want to minimise sum(|y*polynomial(x) - z|^2) for each id
## in the region where x != exclude

weighted_difference <- function(d, n=4, exclude = c(2.5, 3), ...){

  sub <- subset(d, !(Sepal.Width > exclude[1] &
                     Sepal.Width < exclude[2]))
  fit <-  lm(mean_width ~ I(poly(Petal.Length, n, raw=TRUE)*Petal.Width) + Petal.Width - 1, data = sub)
  mutate(d, predict = predict(fit, d),
         difference = Petal.Width - predict )
}

results <- ddply(all, "Species", weighted_difference)

This works, but I would like to use a simpler approach where I first create a new data.frame for the fit,

  exclude <- c(3, 6)
  sub <- subset(all, !(x > exclude[1] & x < exclude[2]))

fit all cases,

 fits <- lm(z ~ I(poly(x, n, raw=TRUE)*y) + y - 1 | id, data = sub)

(this ... | id is invalid syntax apparently)

and use predict on the full data at once,

all <- mutate(all, predict = predict(fits, all), difference = y - predict )

Is there some trick to use lm() like this? Or a better solution? Thanks.

What exactly are you trying by using `...| id` ? If you want to fit random intercepts, you need to use mixed models, from eg the `nlme` package or `lme4`. — Joris Meys, Jun 18 '12 at 08:35
sorry, I guess using random data isn't very clear, I'll update the example. With `... | id` I mean that the fit should be done for each group defined by `id`, like in lattice plots. — baptiste, Jun 18 '12 at 08:57

score 2 · Accepted Answer · answered Jun 18 '12 at 10:02

2

Does lmList (from nlme) do what you want?

library(nlme)
fits <- lmList(z ~ I(poly(x, n, raw=TRUE)*y) + y - 1 | id, data = sub)

answered Jun 18 '12 at 10:02

smillig

5,073
6
36
46

lm fitting multiple subsets of a data.frame

1 Answers1