by group analysis using svyglm in a data.table

Question

I have the following data in a data.table:

        h           x1 y1  swNx11
    1:  1 39.075565717  0  1.03317231703408
    2:  1 40.445951251  0  7.14418755725832
    3:  1 37.800722944  0  0.435946586361557
    4:  1 41.085221504  0  0.381347141150498
    5:  1 36.318077491  0  0.497077163135359
---                                                       
24996: 25 39.110138193  0  0.942922612158002
24997: 25 39.331940413  0  1.42227399208458
24998: 25 37.479473784  0  0.390657876415799
24999: 25 35.892044242  0  0.599937357458247
25000: 25 40.699588303  0  0.486486760245521

I've created a function to analyse them in svyglm:

msmMC <- function(y, x, sw, name){
msm <- svyglm(y ~ x,family=quasibinomial(link="logit"),design = svydesign(~ 1, weights = ~ sw))
out <- cbind("name",coef(summary(msm))[2,1],coef(summary(msm))[2,2])
return(out)
}

msmswNx1<-dt2[,list(dtmsm=list(msmMC(y1, x1, swNx1, Nx1))),by="h"]
outNx1 <- unlist(dt.lm[,msmswNx1])

When I run this function, I get the following error:

Error in [.data.table(dt2, , list(dtmsm = list(msmMC(y1, x1, swNx1, : column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Useage: DT[,sum(colC),by=list(colA,month(colB))]

Yet it works fine with a different model, such as glm or polr. So what is going on here? Why is svyglm so picky about by-group processing with a data.table?

You have replaced one typo with another, The error shows that what you have `dt2[,list(dtmsm=list(msmMC(y1, x1, swNx1, Nx1))),by="h"]` is not what you've written, as there is an empty `j` argument in the error message. `[.data.table(dt2, , list(dt`. — mnel, Mar 12 '13 at 01:05
given that you don't specify any strata, PSUs, or replicate weights, i doubt that your data are actually a complex sample survey design.. and if they're not, you have no reason to use the `survey` package or `svyglm` -- instead, simply use the `weights=` argument of the `glm` function. in my experience, `survey` objects do not work cleanly with `data.table` or `ffdf` or any other weird data types. :) — Anthony Damico, Mar 12 '13 at 01:39
@Anthony: Indeed, my data do not come from a survey design. I'm trying to fit an inverse probability weighted model. I'm using survey to get robust standard errors for the beta coefficient in my model. As far as I know, I can't do that with the glm function can I? Any thoughts? — Ashley Naimi, Mar 12 '13 at 02:33
@user1849779 idk. figure out what you want to do before you worry about breaking it down groupwise. :) if you decide you must use the `survey` package, stop using `data.table`. clean up your code and [loop through the survey package like this](http://stackoverflow.com/questions/13402829/r-looping-through-in-survey-package/13406563#13406563). — Anthony Damico, Mar 12 '13 at 02:36
@mnel: I double checked, and there's no typo. In fact, when I replace `msmswNx1<-dt2[,list(dtmsm=list(msmMC(y1, x1, swNx1, Nx1))),by="h"]` with `msmswNx1<-dt2[, ,list(dtmsm=list(msmMC(y1, x1, swNx1, Nx1))),by="h"]` no error is returned. However, the output is senseless. — Ashley Naimi, Mar 12 '13 at 02:39
Clearly there **was** a typo. Provide a reproducible example and we can go from there. — mnel, Mar 12 '13 at 02:48

score 0 · Answer 1 · answered Mar 12 '13 at 00:50

0

I doubt that it has worked for lm glm or polr as the error is an argument matching one.

You will need to wrap the whole thing in list

dt2[,list(dtmsm=list(msmMC(y1, x1, swNx1, Nx1))),by="h"]

Or perhaps, you have just misplaced the list call given that msmMC appears to return an object that might be a data.frame, list or data.table

dt2[,list(dtmsm=msmMC(y1, x1, swNx1, Nx1)),by="h"]

answered Mar 12 '13 at 00:50

mnel

113,303
27
265
254

@user1849779 -- your error message comes from a new typo that you've just introduced..... – mnel Mar 12 '13 at 01:05

by group analysis using svyglm in a data.table

1 Answers1