0

I am working with loglm(count~A+B+C+D+E, data=whatever).

My problem is that I would like to compute every possible combination of all of the effects. That is: A and A+A:B and A+C+C:B+A:B:C:D:E and so on into (seeming) infinity.

Any suggestions?

EDIT The data looks something like

df <- structure(list(count = c(0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L),  
A = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,  
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), B = c(1L, 1L, 1L, 1L, 1L,  
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,  
2L, 2L, 2L, 2L, 2L, 2L), C = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,  
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),  
D = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,  
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), E = c(1L, 1L, 2L, 2L, 1L,  
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,  
2L, 1L, 1L, 2L, 2L, 1L)), .Names = c("count", "A", "B", "C", "D", "E"),  
class = "data.frame", row.names = c(NA, -29L))

the problem i get is:

> data(SampleData)
Warning message:
In data(SampleData) : data set ‘SampleData’ not found
> fm1 <- loglm(count ~ ., data = SampleData)
> dd <- dredge(fm1)
Error in rownames(ct)[match(names(coef1), rownames(ct))] <- fxdCoefNames : 
  NAs are not allowed in subscripted assignments
In addition: Warning messages:
1: In table(fac) : attempt to set an attribute on NULL (model 1 skipped)
2: In data[do.call("cbind", lapply(fac, as.numeric))] <- rsp :
  number of items to replace is not a multiple of replacement length
3: In st[do.call("cbind", lapply(fac, as.numeric))] <- exp(offset) :
  number of items to replace is not a multiple of replacement length
4: In double(nmar) : vector size cannot be NA/NaN (model 2 skipped)
5: In data[do.call("cbind", lapply(fac, as.numeric))] <- rsp :
  number of items to replace is not a multiple of replacement length
6: In st[do.call("cbind", lapply(fac, as.numeric))] <- exp(offset) :
  number of items to replace is not a multiple of replacement length
7: In double(nmar) : vector size cannot be NA/NaN (model 3 skipped)
> subset(dd, delta < 4)
Error in subset(dd, delta < 4) : object 'dd' not found
Marek
  • 49,472
  • 15
  • 99
  • 121
  • Doesn't a*b*c*d*e give you that? – Ari B. Friedman Apr 16 '12 at 23:36
  • Also, have you read [this thread](http://stackoverflow.com/questions/7383433/how-to-get-all-possible-combinations-of-n-number-of-data-set)? – Eric Fail Apr 16 '12 at 23:39
  • that gives you the one which has all combinations, right? I need loglm to run every possible time. loglm(a) and loglm(a*b*c*d*e) – user1337445 Apr 16 '12 at 23:43
  • So maybe you're describing a stepwise model fitting procedure that goes all the way from loglm(a) up to loglm(a*b*c*d*e)? – joran Apr 16 '12 at 23:49
  • @joran yes I think stepwise is a good way of looking at it, but I'm not sure how to make it print the results of each individual loglm. – user1337445 Apr 17 '12 at 00:49
  • @user1337445, you need to read the warnings. R informs you that `data set ‘SampleData’ not found` The name of the subject is `df`. Have a look at `ls()`, that command list the objects in your environment … – Eric Fail Apr 17 '12 at 06:36

2 Answers2

1

I believe this would get you want you want,

install.packages('MuMIn', dependencies = TRUE)
library(MuMIn)    

Example from Burnham and Anderson (2002), page 100: (taken from ?dredge)

data(Cement)
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
subset(dd, delta < 4)

All you have to do is replace lm(y ~ with loglm(count~ and remove all none-explanatory variables from your data.

Eric Fail
  • 8,191
  • 8
  • 72
  • 128
  • I get an error at the dd <- dredge(fm1) step. Error in rownames(ct)[match(names(coef1), rownames(ct))] <- fxdCoefNames : NAs are not allowed in subscripted assignments In addition: Warning messages: 1: In table(fac) : attempt to set an attribute on NULL (model 1 skipped) 2: In data[do.call("cbind", lapply(fac, as.numeric))] <- rsp : number of items to replace is not a multiple of replacement length 3: In st[do.call("cbind", lapply(fac, as.numeric))] <- exp(offset) : number of items to replace is not a multiple of replacement length etc.. – user1337445 Apr 17 '12 at 00:14
  • @user1337445, it would be helpful if you could supply some sample data. Not necessarily _your_ data, but some data that can reproduce _your_ problem. – Eric Fail Apr 17 '12 at 01:28
  • Thanks, that's a good point. I will update the original post with some data – user1337445 Apr 17 '12 at 01:47
  • @user1337445, are you familiar with the `dput` function? It '[w]rites an ASCII text representation of an R object to a file or connection, or uses one to recreate the object' (`?dput`). You should put some time into your example. That way we don't have to struggle to reproduce your problem, but can focus on solving it. – Eric Fail Apr 17 '12 at 02:00
  • sorry, i'm not really a CS guy and don't have a lot of experience with r and the dput thing is kind of beyond me. i might be able to get help putting it up here tomorrow. – user1337445 Apr 17 '12 at 02:20
  • @user1337445, now your data is a bit more accessible, only thing you need to do now is write (with R code) how you specify your model and note where you get the error. – Eric Fail Apr 17 '12 at 02:28
  • 2
    When I try this with the data you posted above, `dredge` barfs because this model has 32 terms, so there are 2^31=2147483648 possible submodels -- too many to reasonably fit ... (at 100 model fits per second, I compute that this would take about 248 days to fit ...) – Ben Bolker Apr 17 '12 at 02:34
  • yeah, i was thinking that it would be a lot, but that i could sort them afterwards by the X^2. Perhaps it would make more sense for me to try to whittle down the variables to a more manageable number. – user1337445 Apr 17 '12 at 02:41
  • 1
    for some reason I find that `count~.` fails in the way that you have posted above, but `count~A*B*C*D*E` fails in the way I described (i.e., unreasonably many models). However, there may be a further problem with `loglm` models -- even with a much smaller model set (i.e. `count~A+B`) I get the `Error in rownames(ct)[match(names(coef1), rownames(ct))] <- fxdCoefNames : NAs are not allowed in subscripted assignments` error ... – Ben Bolker Apr 17 '12 at 02:53
  • PPPS I don't know if it works for `loglm` models or not, but you might look at the `glmulti` package, as featured in http://stackoverflow.com/questions/10182804/using-the-glmulti-package-in-r-for-exhaustive-search-multiple-regression-for-aka – Ben Bolker Apr 17 '12 at 14:53
1

As I always say, "What is the problem you are trying to solve?" Presumably you don't actually need all those 2^N results, so what is it you are looking for? Perhaps you want some sort of sieve to zero in on the effects which have the strongest, errr... effect :-) on the outcome?

BTW, you might want to play a bit with Eureqa , a package from http://creativemachines.cornell.edu/eureqa .

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73