Calculating column value using a glm

Question

I have a dataset with about 20,000 lines of data. The data are grouped by year and area. I'm trying to calculate the length at which 50% of shrimp are female. When I run the following code, It just returns the value for the entire dataset, not unique values for each year/area pair.

The dataset is named shrimp.

library(MASS)
library(plyr)

Data

shrimp <- structure(list(cruise = c(1972L, 1972L, 1972L, 1972L, 1972L, 
1972L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 1985L, 1985L, 1985L, 1985L, 1985L, 1985L, 1985L, 
1985L, 1985L, 1985L, 1985L, 1985L), areaname = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Balboa", 
"Chiniak", "Pavlof"), class = "factor"), size = c(9, 9.5, 10, 
10.5, 11, 11.5, 15.5, 16, 16.5, 17, 17.5, 18, 18.5, 19, 19.5, 
20, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17, 17.5, 18, 18.5, 19
), male = c(49L, 61L, 92L, 57L, 46L, 9L, 151L, 64L, 30L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 638L, 625L, 666L, 447L, 393L, 214L, 119L, 
33L, 12L, 13L, 7L, 6L), female = c(0L, 0L, 0L, 0L, 0L, 0L, 3L, 
15L, 35L, 78L, 122L, 105L, 76L, 28L, 13L, 36L, 0L, 5L, 8L, 13L, 
38L, 60L, 54L, 38L, 28L, 93L, 131L, 195L), total = c(49L, 61L, 
92L, 57L, 46L, 9L, 154L, 79L, 65L, 78L, 123L, 105L, 76L, 28L, 
13L, 36L, 638L, 630L, 674L, 460L, 431L, 274L, 173L, 71L, 40L, 
106L, 138L, 201L)), .Names = c("cruise", "areaname", "size", 
"male", "female", "total"), row.names = c(NA, 28L), class = "data.frame")

Model

cdata <- ddply(shrimp, c("cruise", "areaname"), summarise, 
                    female = dose.p(glm(cbind(female, males) ~ size,
                    family=binomial(logit), data=shrimp), cf=1:2, p =.5))

You are passing the entire dataset in when you write `data=shrimp` — IRTFM, Jan 24 '15 at 01:56
I realized that, but it doesn't work if I delete that. I'm a newbie... What can I write instead? Thanks! — Aaren Ellsworth, Jan 24 '15 at 01:57
You are using more than just `plyr` since I'm pretty sure that `dose.p` is not a `plyr` function. Why not post a real example with proper calls to `library`? — IRTFM, Jan 24 '15 at 02:02
Hi Aaren, have a wee look at this page http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. It details how you may create a reproducible dataset - makes it easier for you to get a working answer. bw — user20650, Jan 24 '15 at 02:20

score 2 · Answer 1 · edited Jan 24 '15 at 03:17

2

There's nothing to test this against (yet) but perhaps:

library(MASS)
library(plyr)
cdata <- ddply(shrimp, c("cruise", "areaname"), summarise, 
           female = dose.p( glm(cbind(female, male) ~ size, 
                         family=binomial("logit") ),  cf=1:2, p = 0.5)
               ) 
##   cruise areaname   female
## 1   1972   Balboa 67.15131
## 2   1985   Pavlof 16.88196
## 3   2003  Chiniak 16.37031

edited Jan 24 '15 at 03:17

Ben Bolker

211,554
25
370
453

answered Jan 24 '15 at 02:09

IRTFM

258,963
21
364
487

Nope. I get error: Error in vector(type, length) : vector: cannot make a vector of mode 'closure'. – Aaren Ellsworth Jan 24 '15 at 02:15
Ben seems to ahve fixed the problem. Testing code against real data objects is the only way to nail down syntactic errors – IRTFM Jan 24 '15 at 05:49

Calculating column value using a glm

1 Answers1