3

Say you have an array like

dat <-  array(c(126, 100, 35, 61, 908, 688, 497, 807, 913, 747, 336, 598, 235, 172, 58, 121,402, 308, 121, 215, 182, 156, 72, 98, 60, 99, 11, 43, 104, 89, 21, 36), dim = c(2, 2, 8),dimnames = list(a = c(1, 0), b = c(1, 0), c = 1:8))


> > dat
, , c = 1

   b
a     1  0
  1 126 35
  0 100 61

, , c = 2

   b
a     1   0
  1 908 497
  0 688 807

, , c = 3

   b
a     1   0
  1 913 336
  0 747 598

, , c = 4

   b
a     1   0
  1 235  58
  0 172 121

, , c = 5

   b
a     1   0
  1 402 121
  0 308 215

, , c = 6

   b
a     1  0
  1 182 72
  0 156 98

, , c = 7

   b
a    1  0
  1 60 11
  0 99 43

, , c = 8

   b
a     1  0
  1 104 21
  0  89 36

and you want to fit logistic regression to predict a. Is there a simple way to generate a data frame from this array to use in glm? ie a data frame like

a b c
1 1 1 for 126 rows then
...
0 1 1 for 100 rows, etc.

Basically I need to get data to fit logistic regression when given the table with counts. It seems like there should be a simple way of doing it without manually generating the data.

thanks

thelatemail
  • 91,185
  • 12
  • 128
  • 188
sayhey69
  • 1,089
  • 1
  • 9
  • 15

4 Answers4

4

One way would be to start with the melt function in the reshape2 package:

library(reshape2)

datM <- melt(dat)
head(datM, 2)
#   a b c value
# 1 1 1 1   126
# 2 0 1 1   100

Then dcast that data to get the numbers of outcomes on one row:

dat2 <- dcast(datM, b + c ~ a)
head(dat2, 2)
#   b c   0   1
# 1 0 1  61  35
# 2 0 2 807 497

You can then use this data to perform a glm where the response is a 2-column matrix giving the numbers of successes and failures:

response <- as.matrix(dat2[, c(4, 3)])
bb <- dat2[, "b"]
cc <- dat2[, "c"]
glm1 <- glm(response ~ bb + cc, family = binomial(link = "logit"))

However, the model degrees of freedom (and log-likelihood, etc.) will not reflect the data structure you asked for in your question. To get the specific data structure you were aiming for, you could go back to the datM object.

EDIT:

The following loops over all columns of datM except for the value column, repeating the values datM$value times:

datRep <- lapply(datM[-grep("value", names(datM))], rep, times = datM$value)

Then cbind that back into a matrix and convert to data.frame to get the data structure you wanted:

dat3 <- as.data.frame(do.call(cbind, datRep))

glm2 <- glm(a ~ b + c, data = dat3, family = binomial(link = "logit"))

The coefficients of the two models are the same:

> coef(glm1)
(Intercept)          bb          cc 
-0.43854838  0.77039283 -0.03328575 
> coef(glm2)
(Intercept)           b           c 
-0.43854838  0.77039283 -0.03328575 

But, as mentioned, the degrees of freedom, etc will not be:

> glm1$deviance
[1] 29.39535
> glm2$deviance
[1] 11381.87
BenBarnes
  • 19,114
  • 6
  • 56
  • 74
1

Ugly as sin, but does what you need for this example.

dat1 <- data.frame(value = as.vector(dat),
    a=dimnames(dat)$a,
    b=rep(dimnames(dat)$b, each=length(dimnames(dat)$a)),
    c=rep(dimnames(dat)$c, each=length(dimnames(dat)$a)*length(dimnames(dat)$b)))

Better would be to use melt, as in @BenBarnes answer. This is more flexible and avoids the creation of factors.

dat1 <- melt(dat)

Then to get the expanded rows, you can use rep

dat2 <- data.frame(a=rep(dat1$a, dat1$value),
                   b=rep(dat1$b, dat1$value),
                   c=rep(dat1$c, dat1$value))
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
1

Another alternative using base functions to get the counts data which you can then expand as in @MatthewLundberg's answer:

dat1 <- data.frame(do.call(expand.grid,dimnames(dat)),value=as.vector(dat))

   a b c value
1  1 1 1   126
2  0 1 1   100
3  1 0 1    35
4  0 0 1    61
5  1 1 2   908
...

Expand as stolen from previous answer...

dat2 <- data.frame(a=rep(dat1$a, dat1$value),
                   b=rep(dat1$b, dat1$value),
                   c=rep(dat1$c, dat1$value))
thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

A minimal way to undertake the first part, converting the array to the data.frame is to use as.data.frame.table. Then proceed as @MatthewLundberg or @thelatemail suggest.

df0 <- as.data.frame.table(dat)

head(df0)
#    a b c Freq
# 1  1 1 1  126
# 2  0 1 1  100
# 3  1 0 1   35
# 4  0 0 1   61
# 5  1 1 2  908
# 6  0 1 2  688
guyabel
  • 8,014
  • 6
  • 57
  • 86