My question is almost answered in dplyr 0.3.0.9000 how to use do() correctly, but not quite.
I have some data that looks like this:
> head(myData)
Sequence Index xSamples ySamples
6 0 5 0.3316187 3.244171
7 0 6 1.5131778 2.719893
8 0 7 1.9088933 3.122991
9 0 8 2.7940244 3.616815
10 0 9 3.6500311 3.519641
The Sequence actually ranges from 0 to 9999. Within each Sequence both the xSamples and the ySamples should be linear with respect to Index. The plan is to group myData by Sequence and then use lm()
via do()
on each group. The code goes something like this (lifted shamelessly from the help):
library(dplyr)
myData_by_sequence <- group_by(myData, Sequence)
models <- myData_by_sequence %>% do(mod = lm(xSamples ~ Index, data = .))
This works, but the result I get is this . . .
> head(models)
Source: local data frame [10000 x 2]
Sequence mod
1 0 <S3:lm>
2 1 <S3:lm>
3 2 <S3:lm>
4 3 <S3:lm>
5 4 <S3:lm>
6 5 <S3:lm>
. . . and the data I want is stuck in that second column. I have a working plyr
solution which goes like this . . .
models <- dlply(myData, "Sequence", function(df) lm(xSamples ~ Index, data = df))
xresult <- ldply(models, coef)
. . . and this gives me the results broken out into a data frame thanks to coef()
. The catch is I can't mix dplyr (which I typically use and love) with plyr, and I can't seem to get coef()
working with that second column from the dplyr output.
I've tried a few other approaches such as trying the coef()
and lm()
steps together, and I can break out the second column into a list of linear models, but I can't use do()
on a list.
I really feel like there is something obvious I'm missing here. R is definitely not my primary language. Any help would be appreciated.
edit Have tried . . .
result <-
rects %>%
group_by(Sequence) %>%
do(data.frame(Coef = coef(lm(xSamples ~ Frame, data = .))))
. . . and get something very close, but with the coefficients stacked in the same column:
Sequence Coef
1 0 -5.0189823
2 0 1.0004240
3 1 -4.9411745
4 1 0.9981858