1

I am attempting to apply a function to a dataset for each combination of multiple factors. The function has two arguments. I have attempted solutions based on previous questions on conditional summing in R and using the plyr package with unsuccessful results.

An example is useful. Here, x refers to "events" and y to "responses" for two conditions.

dat <- data.frame(x=c(0,0,1,1,0,0,1,1),
              y=c(2,1,1,2,1,2,1,0),
              g1=c("a","a","a","a","b","b","b","b"),
              g2=c("c","d","c","d","c","d","c","d"))

attach(dat)

I can get counts or sums etc just fine:

numberTrials <- aggregate(y,list(g1,g2),length)
nEvents <- aggregate(x,list(g1,g2),sum)

Now I want to express the number of "2" responses (y==2) to an event (x==1) as a proportion of the total number of events, for each combination of the group factors i.e. length(y[x==1 & y==2])/sum(x).

I've tried writing a function to do this calculation, then applying the function to each subset using by:

propFun <- function(events,response){
# where x is the events and y is the response
nEvents <- sum(events)
nResp2ToEvent <- length(response[events==1 & response==2])
propFAs <- nResp2ToEvent/nEvents
return(propFAs)
}

dataProp <- by(dat,list(g1,g2),propFun(events=x),response=y)

However, the call to by produces:

Error in propFun(events = x) : 
argument "response" is missing, with no default

I have been similarly unsuccessful using sapply and ddply.

I'm sure that the error I get has a simple syntactical fix; however I would also be interested in any better solutions to the overall problem. Thanks

Community
  • 1
  • 1
tsawallis
  • 1,035
  • 4
  • 13
  • 26
  • Stick with Joran's answer, but I think your error message is due to a misplaced ")" . Try `by(dat,list(g1,g2),propFun(events=x ,response=y) )` and see if your code runs. – Carl Witthoft Mar 06 '12 at 20:13
  • Thanks Carl, but I believed I tried that iteration too. That produces the error `Error in FUN(X[[1L]], ...) : could not find function "FUN"`. According to `?by`, further arguments to FUN should be provided after an additional comma in the call to `by`; I haven't been able to get this to work. – tsawallis Mar 07 '12 at 16:50

1 Answers1

1

I think this is what you're after, using ddply and summarise:

ddply(dat,.(g1,g2),summarise,ev = length(y[x==1 & y==2])/sum(x))

  g1 g2 ev
1  a  c  0
2  a  d  1
3  b  c  0
4  b  d  0
joran
  • 169,992
  • 32
  • 429
  • 468