0

Possible Duplicate:
Averaging column values for specific sections of data corresponding to other column values

I would like to analyze a dataset by group. The data is set up like this:

Group   Result   cens
   A    1.3        1
   A    2.4        0
   A    2.1        0
   B    1.2        1
   B    1.7        0
   B    1.9        0

I have a function that calculates the following

sumStats = function(obs, cens) {
detects = obs[cens==0]
nondetects= obs[cens=1]
mean.detects=mean(detects) 
return(mean.detects) }

This of course is a simple function for illustration purpose. Is there a function in R that will allow me to use this home-made function that needs 2 variables input to analyze the data by groups.

I looked into the by function but it seems to take in 1 column data at a time.

Community
  • 1
  • 1
Amateur
  • 1,247
  • 6
  • 20
  • 30
  • 1
    This close to the most common question asked here on the [r] tag. There is also an `[r-faq]` tag. [This question](http://stackoverflow.com/questions/11562656/averaging-column-values-for-specific-sections-of-data-corresponding-to-other-col) will give you inspiration. – mnel Dec 09 '12 at 23:30
  • It is not quite the same because I have 2 variables that need to pass through the function. – Amateur Dec 10 '12 at 04:51
  • Once you get the proper subset of the data, there is only one variable to pass to the function. – Matthew Lundberg Dec 10 '12 at 05:44

2 Answers2

2

Import your data:

test <- read.table(header=TRUE,textConnection("Group   Result   cens
   A    1.3        1
   A    2.4        0
   A    2.1        0
   B    1.2        1
   B    1.7        0
   B    1.9        0"))

Though there are many ways to do this, using by specifically you could do something like this (assuming your dataframe is called test):

by(test,test$Group,function(x) mean(x$Result[x$cens==1]))

which will give you the mean of all the Results values within each group which have cens==1

Output looks like:

test$Group: A
[1] 1.3
----------------------------------------------------------------------
test$Group: B
[1] 1.2

To help you understand how this might work with your function, consider this: If you just ask the by statement to return the contents of each group, you will get:

> by(test,test$Group,function(x) return(x))
test$Group: A
  Group Result cens
1     A    1.3    1
2     A    2.4    0
3     A    2.1    0
----------------------------------------------------------------------- 
test$Group: B
  Group Result cens
4     B    1.2    1
5     B    1.7    0
6     B    1.9    0

...which is actually 2 data frames with only the rows for each group, stored as a list: This means you can access parts of the data.frames for each group as you would before they they were split up. The x in the above functions is referring to the whole sub-dataframe for each of the groups. I.e. - you can use individual variables as part of x to pass to functions - a basic example:

> by(test,test$Group,function(x) x$Result)
test$Group: A
[1] 1.3 2.4 2.1
-------------------------------------------------------------------
test$Group: B
[1] 1.2 1.7 1.9

Now, to finally get around to answering your specific query! If you take an example function which gets the mean of two inputs separately:

sumStats = function(var1, var2) {
   res1 <- mean(var1)
   res2 <- mean(var2)
   output <- c(res1,res2)
   return(output)
}

You could call this using by to get the mean of both Result and cens like so:

> by(test,test$Group,function(x) sumStats(x$Result,x$cens))
test$Group: A
[1] 1.9333333 0.3333333
---------------------------------------------------------------------- 
test$Group: B
[1] 1.6000000 0.3333333

Hope that is helpful.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I used to have these groups in a array and use the apply function. Now I want to put these groups in 1 dataframe and the apply function doesn't seem to work any more. The real function is actually much more complicated. I just use the mean as an example. The point is I would like to learn how to pass both result and cens variable to the function. – Amateur Dec 10 '12 at 04:56
  • @Amateur - I've made some edits to my answer to hopefully give you some pointers and make things clearer. – thelatemail Dec 10 '12 at 05:40
2

The aggregate function is designed for this.

 aggregate(dfrm$cens, dfrm["group"], FUN-mean)

You can get the mean value os several columns at once, each within 'group'

aggregate(dfrm[ , c("Result", "cens") ], dfrm["group"], FUN=mean)
IRTFM
  • 258,963
  • 21
  • 364
  • 487